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Preface 



The study of matrices occupies a singular place within mathematics. It 
is still an area of active research, and it is used by every mathematician 
and by many scientists working in various specialities. Several examples 
illustrate its versatility: 

• Scientific computing libraries began growing around matrix calculus. 
As a matter of fact, the discretization of partial differential operators 
is an endless source of linear finite-dimensional problems. 

• At a discrete level, the maximum principle is related to nonnegative 
matrices. 

• Control theory and stabilization of systems with finitely many degrees 
of freedom involve spectral analysis of matrices. 

• The discrete Fourier transform, including the fast Fourier transform, 
makes use of Toeplitz matrices. 

• Statistics is widely based on correlation matrices. 

• The generalized inverse is involved in least-squares approximation. 

• Symmetric matrices are inertia, deformation, or viscous tensors in 
continuum mechanics. 

• Markov processes involve stochastic or bistochastic matrices. 

• Graphs can be described in a useful way by square matrices. 




viii Preface 

• Quantum chemistry is intimately related to matrix groups and their 
representations. 

• The case of quantum mechanics is especially interesting: Observables 
are Hermitian operators, their eigenvalues are energy levels. In the 
early years, quantum mechanics was called “mechanics of matrices,” 
and it has now given rise to the development of the theory of large 
random matrices. See [23] for a thorough account of this fashionable 
topic. 

This text was conceived during the years 1998-2001, on the occasion of 
a course that I taught at the Ecole Normale Superieure de Lyon. As such, 
every result is accompanied by a detailed proof. During this course I tried 
to investigate all the principal mathematical aspects of matrices: algebraic, 
geometric, and analytic. 

In some sense, this is not a specialized book. For instance, it is not as 
detailed as [19] concerning numerics, or as [35] on eigenvalue problems, 
or as [21] about Weyl-type inequalities. But it covers, at a slightly higher 
than basic level, all these aspects, and is therefore well suited for a gradu- 
ate program. Students attracted by more advanced material will find one 
or two deeper results in each chapter but the first one, given with full 
proofs. They will also find further information in about the half of the 
170 exercises. The solutions for exercises are available on the author’s site 
http : //www . umpa . ens-lyon . f r/ ~ serre/exercises . pdf . 

This book is organized into ten chapters. The first three contain the 
basics of matrix theory and should be known by almost every graduate 
student in any mathematical field. The other parts can be read more or 
less independently of each other. However, exercises in a given chapter 
sometimes refer to the material introduced in another one. 

This text was first published in French by Masson (Paris) in 2000, under 
the title Les Matrices: theorie et pratique. I have taken the opportunity 
during the translation process to correct typos and errors, to index a list 
of symbols, to rewrite some unclear paragraphs, and to add a modest 
amount of material and exercises. In particular, I added three sections, 
concerning alternate matrices, the singular value decomposition, and the 
Moore-Penrose generalized inverse. Therefore, this edition differs from the 
French one by about 10 percent of the contents. 

Acknowledgments. Many thanks to the Fcole Normale Superieure de Lyon 
and to my colleagues who have had to put up with my talking to them 
so often about matrices. Special thanks to Sylvie Benzoni for her constant 
interest and useful comments. 



Lyon, France 
December 2001 
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Elementary Theory 



1.1 Basics 

1.1.1 Vectors and Scalars 

Fields. Let {K, + , •) be a field. It could be M, the field of real numbers, G 
(complex numbers), or, more rarely, Q (rational numbers). Other choices 
are possible, of course. The elements of K are called scalars. 

Given a field k, one may build larger fields containing k: algebraic ex- 
tensions k{ai, . . . , a„), fields of rational fractions k{Xi, . . . , X„), fields of 
formal power series /c[[Xi, . . . , X„]]. Since they are rarely used in this book, 
we do not define them and let the reader consult his or her favorite textbook 
on abstract algebra. 

The digits 0 and 1 have the usual meaning in a field K , with 0 -I- a; = 
1 • a; = a;. Let us consider the subring Zl, composed of all sums (possibly 
empty) of the form ±(1 1). Then is isomorphic to either Z or 

to a field "Z I'pZi. In the latter case, p is a prime number, and we call it the 
characteristic oi K . In the former case, K is said to have characteristic 0. 

Vector spaces. Let {E, -I-) be a commutative group. Since E is usually 
not a subset of K, it is an abuse of notation that we use -I- for the additive 
laws of both E and K . Finally, let 
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be a map such that 

(a + b)x = ax + bx, a(x + y) = ax + ay. 

One says that is a vector space over K (one often speaks of a itT-vector 
space) if moreover, 

a{bx) = {ab)x, la; = x, 

hold for all a,b G K and x G E. The elements of E are called vectors. In a 
vector space one always has Ox = 0 (more precisely, Okx = Ob). 

When P,QcK and F,G C E, one denotes by PQ (respectively P + 
Q,F+G, PE) the set of products pq as {p, q) ranges over PxQ (respectively 
P+Q: f+9i Pf as p, g, /, g range over P, Q, F,G). A subgroup (P, +) of {E, +) 
that is stable under multiplication by scalars, i.e., such that KF C F, is 
again a P-vector space. One says that it is a linear subspace of E, or just a 
subspace. Observe that F, as a subgroup, is nonempty, since it contains Ob. 
The intersection of any family of linear subspaces is a linear subspace. The 
sum P + G of two linear subspaces is again a linear subspace. The trivial 
formula (P + G) + P = F + {G + H) allows us to define unambiguously 
F + G + H and, by induction, the sum of any finite family of subsets of E. 
When these subsets are linear subspaces, their sum is also a linear subspace. 

Let / be a set. One denotes by the set of maps a = {ai)i^i : I ^ K 
where only finitely many of the a^’s are nonzero. This set is naturally 
endowed with a P-vector space structure, by the addition and product 
laws 



{a + b)i := ai + bi, (Aa)i := Aa^ 

Let P be a vector space and let z /i be a map from J to P. A linear 
combination of is a sum 

iei 

where the Oj’s are scalars, only finitely many of which are nonzero (in other 
words, (oi )ie/ G K^). This sum involves only finitely many terms. It is a 
vector of P. The family {fi)i£i is free if every linear combination but the 
trivial one (when all coefficients are zero) is nonzero. It is a generating 
family if every vector of P is a linear combination of its elements. In other 
words, is free (respectively generating) if the map 

^ P, 

{o-i)i^I Clifi, 

iel 

is injective (respectively onto). Last, one says that is a basis of P if 

it is free and generating. In that case, the above map is bijective, and it is 
actually an isomorphism between vector spaces. 
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li G C E, one often identifies G and the associated family {g)g^g. The set 
G of linear combinations of elements of ^ is a linear subspace E, called the 
linear subspace spannedhy G ■ It is the smallest linear subspace E containing 
equal to the intersection of all linear subspaces containing G- The subset 
G is generating when G = E. 

One can prove that every iiT-vector space admits at least one basis. In 
the most general setting, this is a consequence of the axiom of choice. 
All the bases of E have the same cardinality, which is therefore called the 
dimension of E, denoted by dim A. The dimension is an upper (respectively 
a lower) bound for the cardinality of free (respectively generating) families. 
In this book we shall only use finite-dimensional vector spaces. If F, G are 
two linear subspaces of E, the following formula holds: 

dim F + dim G = dim F HG + dim(F -|- G) . 

li F nG = {0}, one writes F (B G instead of F -|- G, and one says that F 
and G are in direct sum. One has then 



dim F (B G = dim F + dim G. 



Given a set I, the family (e*)ig/, defined by 



(e*)i 



0. j i, 

1, j = h 



is a basis of , called the canonical basis. The dimension of is therefore 
equal to the cardinality of I. 

In a vector space, every generating family contains at least one basis of 
E. Similarly, given a free family, it is contained in at least one basis of E. 
This is the incomplete basis theorem. 

Let F be a field and K a subfield of L. If F is an L- vector space, then F 
is also a FT-vector space. As a matter of fact, L is itself a FT-vector space, 
and one has 



dimi^ F = dim^ F • dim^ L. 

The most common example (the only one that we shall consider) is F' = IR, 
L — G, for which we have 



dimjR F = 2 dime F. 

Conversely, if G is an JR-vector space, one builds its complexification G^ 
as follows: 



G^ = GxG, 



with the induced structure of an additive group. An element {x, y) of G^ 
is also denoted x + iy. One defines multiplication by a complex number by 



{\ = a + ib^ z = X + iy) Xz := (ax — by, ay + bx). 
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One verifies easily that is a C-vector space, with 

dime G^ = dimjR G. 

Furthermore, G may be identified with an JR-linear subspace of G^ by 

X (x, 0). 

Under this identification, one has G^ = G + iG. In a more general setting, 
one may consider two fields K and L with K C L, instead of JR and C, but 
the construction of G^ is more delicate and involves the notion of tensor 
product. We shall not use it in this book. 

One says that a polynomial P G L[X] splits over L if it can be written 
as a product of the form 

r 

aJ^(X — a,Qi€L, r & IN , Ui & IN* . 

Such a factorization is unique, up to the order of the factors. A field L in 
which every nonconstant polynomial P G L[X] admits a root, or equiva- 
lently in which every polynomial P G L[X] splits, is algebraically closed. If 
the field K' contains the field K and if every polynomial P G K[X] admits 
a root in K' , then the set of roots in K' of polynomials in K\X] is an alge- 
braically closed field that contains K, and it is the smallest such field. One 
calls K' the algebraic closure of K . Every field K admits an algebraic clo- 
sure, unique up to isomorphism, denoted by K. The fundamental theorem 
of algebra asserts that IR — G. The algebraic closure of Q, for instance, 
is the set of algebraic complex numbers, meaning that they are roots of 
polynomials P G ^[X], 



1.1.2 Matrices 

Let K he a, field. If n, m > 1, a matrix of size n x m with entries in A" is a 
map from {!,... , n} x {I, . . . , m} with values in K. One represents it as 
an array with n rows and m columns, an element of K (an entry) at each 
point of intersection of a row an a column. In general, if M is the name of 
the matrix, one denotes by the element at the intersection of the tth 
row and the jth column. One has therefore 

/ mil • • ■ rriim \ 

M= : 

\ aiyii . . . rriyijn j 

which one also writes 



AI — {rriij)i.Ci<n,l<j<m- 

In particular circumstances (extraction of matrices or minors, for example) 
the rows and the columns can be numbered in a different way, using non- 
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consecutive numbers. One needs only two finite sets, one for indexing the 
rows, the other for indexing the columns. 

The set of matrices of size n x m with entries in K is denoted by 
It is an additive group, where M + M' denotes the matrix M” 
whose entries are given by m"- = mij + . One defines likewise multipli- 

cation by a scalar a € K. The matrix M' := aM is defined by mb = arriij. 
One has the formulas a{bM) = {ab)M, a{M + M') = (aM) + {aM'), and 
(a -I- b)M = {aM) + {bM), which endow with a itT-vector space 

structure. The zero matrix is denoted by 0, or 0„m when one needs to avoid 
ambiguity. 

When m = n, one writes simply instead of M„xn(-ff), and 0„ 

instead of 0„„. The matrices of sizes n x n are called square matrices. One 
writes /„ for the identity matrix, defined by 



my = Sj 



0, if i yf j, 

1, if i = j. 



In other words. 



/I 0 ••• 0 \ 

0 : 

: ■■. ■■. 0 

V 0 ••• 0 1 y 



The identity matrix is a special case of a permutation matrix, which are 
square matrices having exactly one nonzero entry in each row and each 
column, that entry being a 1. In other words, a permutation matrix M 
reads 



mij 




for some permutation a G S„. 

A square matrix for which i < j implies my = 0 is called a lower 
triangular matrix. It is upper triangular if z > j implies my = 0. It is 
strictly upper triangular if z > j implies my = 0. Last, it is diagonal if my 
vanishes for every pair {i,j) such that z yf j. In particular, given n scalars 
di, . . . ,dn G K, one denotes by diag(di, ... , d„) the diagonal matrix whose 
diagonal term mu equals di for every index z. 

When m = 1, a matrix M of size rz x 1 is called a column vector. One 
identifies it with the vector of AT" whose zth coordinate in the canonical 
basis is mu. This identification is an isomorphism between M„xi(AT) and 
iL". Likewise, the matrices of size 1 x m are called row vectors. 

A matrix M € M„xm(AT) may be viewed as the ordered list of its 
columns (1 < J < nr). The dimension of the linear subspace spanned 
by the M^^'> in AT" is called the rank of M and denoted by rk M. 
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1.1.3 Product of Matrices 

Let n,m,p > 1 he three positive integers. We define a (noncommutative) 
multiplication law 

1^nxm{K) X ’M.mxpiK) 1^nxp{K): 

(M, M') ^ MM', 

which we call the product of M and M' . The matrix M" = MM' is given 
by the formula 

m 

m'j = ^ l<i<n, l<j<p. 

k^l 

We check easily that this law is associative: if M, M' , and M" have 
respective sizes n x m, m x p, p x q, one has 

{MM')M" = M{M'M"). 

The product is distributive with respect to addition: 

M(M' + M") = MM' + MM", {M + M')M" = MM" + M'M". 

It also satisfies 

a{MM') = (aM)M' = M{aM'), Vo G K. 

Last, if m = n, then I„M' = M' . Similarly, if m = p, then MIm = M. 

The product is an internal composition law in M„(iL), which endows 
this space with a structure of a unitary iL-algebra. It is noncommutative 
in general. For this reason, we define the commutator of M and N by 
[M,N] := MN — NM. For a square matrix M G M„(iF), one defines 
M'^ = MM, M^ = MM'^ = M'^M (from associativity), ..., = M^M. 

One completes this notation by M^ = M and One has M^ M^ = 

for all j, k G IN. If M^ = 0 for some integer k G IN, one says that 
M is nilpotent. One says that M is idempotent if I„ — M is nilpotent. 

One says that two matrices M,N G M„(iF) commute with each other 
if MN = NM. The powers of a square matrix M commute pairwise. In 
particular , the set K{M) formed by polynomials in M, which cinsists of 
matrices of the form 

Uoln T a\M + • • • + OrM"^ , Oq, ... , ftr G K, r G IN , 

is a commutative algebra. 

One also has the formula (see Exercise 2) 

rk(MM') < min{rk M, rk M'}. 



1 . 1.4 Matrices as Linear Maps 

Let E, F be two iF-vector spaces. A map u : E ^ F is linear (one also 
speaks of a homomorphism) if u{x + y) = u(x) + u(y) and u{ax) = au{x) 
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for every x,y & E and a G K . One then has u(0) = 0. The preimage 
u“^(0), denoted by keru, is the kernel of u. It is a linear subspace of E. 
The range u{E) is also a linear subspace of F . The set of homomorphisms 
of E into F is a F-vector space, denoted by C{E, F).li F = E, one defines 
End(F) := C(E,F); its elements are the endomorphisms of E. 

The identification of M„xi(F) with FT" allows us to consider the matri- 
ces of size n X m as linear maps from FT™ to F". If M G M„xm(F), one 
proceeds as in the following diagram: 

^ M^xi(F) ^ M„xi(F) ^ F", 

X I— > X ^ Y = MX y. 

Namely, the image of the vector x with coordinates xi , . . . , Xm is the vector 
y with coordinates yi, . ■ . ,yn given by 

m 

y, = ^m^jXj. ( 1 . 1 ) 

1=1 

One thus obtains an isomorphism between M„xm(F) and K^), 

which we shall use frequently in studying matrix properties. 

More generally, if E, F are F-vector spaces of respective dimensions m 
and n, in which one chooses bases /3 = {ei, . . . , Cm} and 7 = {/i, . . . , /„}, 
one may construct the linear map u : E F hy 

u{xiei -\ h XmCm) = Vlfl H 1- ynfn, 

via the formulas (1.1). One says that M is the matrix of u in the bases /3, 

7- 

Let E, F, G be three F-vector spaces of dimensions p,m,n. Let us 
choose respective bases a, [3, 7 . Given two matrices M, M' of sizes n x m 
and m X p, corresponding to linear maps u : F G and u' : E F, the 
product MM' is the matrix of the linear map uo u' : E ^ G. Here lies 
the origin of the definition of the product of matrices. The associativity 
of the product expresses that of the composition of maps. One will note, 
however, that the isomorphism between M„xm(F) and C{E,F) is by no 
means canonical, since the correspondence M u always depends on an 
arbitrary choice of two bases. One thus cannot reduce the entire theory of 
matrices to that of linear maps, and vice versa. 

When E = F is a, FT-vector space of dimension n, it is often worth 
choosing a single basis (7 = /3 with the previous notation). One then has 
an algebra isomorphism M ^ u between M„(FT) and End(F), the algebra 
of endomorphisms of E. Again, this isomorphism depends on an arbitrary 
choice of basis. 

If M is the matrix of u G £{E,F) in the bases a, [3, the linear subspace 
u{E) is spanned by the vectors of F whose representations in the basis (3 
are the columns M^l'> of M. Its dimension thus equals rkM. 

If M G M„xm(AT), one defines the kernel of M to be the set kerM of 
those X G Mmxi(AT) such that MX = 0„. The image of AT™ under M is 
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called the range of M, sometimes denoted by R{M). The kernel and the 
range of M are linear subspaces of and itl", respectively. The range is 
spanned by the columns of M and therefore has dimension rk M . 

Proposition 1.1.1 Let K he a field. If M G then 

m = dim ker M + rk M. 

Proof 

Let {/i, . . . , fr} be a basis of R{M). By construction, there exist vectors 
{ei,... ,€r} of if™ such that Mcj = fj. Let E be the linear subspace 
spanned by the ej. If e = OjCj G ker M, then Ojfj = 0, and thus the 
Qj vanish. It follows that the restriction M : E ^ R{M) is an isomorphism, 
so that dim if = rk M. 

If e G if™, then Me G R{M), and there exists e' G E such that Me' = 
Me. Therefore, e = e' + (e — e') G E + kerM, so that if™ = E + ker M. 
Since E n ker M = {0}, one has m = dim E + dim ker M. 



1.2 Change of Basis 

Let if be a if-vector space, in which one chooses a basis /? = {ei, . . . , e„}. 
Let P G M„(if ) be an invertible matrix.^ The set (3' = {e^, . . . , ej^} defined 
by 

n 

i=i 

is a basis of E. One says that P is the matrix of the change of basis (3 f3' , 

or the change-of-basis matrix. If a; G if has coordinates (xi, . . . , Xn) in the 
basis (3 and {x'l, . . . , in the basis (3' , one then has the formulas 



n 




1=1 

If u : if ^ T’ is a linear map, one may compare the matrices of u for 
different choices of the bases of E and F. Let (3, (3' be bases of E and let 
7 , 7 ' be bases of E. Let us denote by P, Q the change-of-basis matrices of 
(3 ^ f3' and 7 i-^- 7 '. Finally, let M, M' be the matrices of u in the bases 
/i, 7 and f3',^', respectively. Then 

MP = QM', 

or M' = Q~^MP, where Q~^ denotes the inverse of Q. One says that M 
and M' are equivalent. Two equivalent matrices have same rank. 



^See Section 2.2 for the meaning of this notion. 
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li E = F and u € End(ii^), one may compare the matrices M, M' of u 
in two different bases ( 3 ,f 3 ' (here 7 = /3 and 7' = /?'). The above formula 
becomes 

M' = P-^MP. 

One says that M and M' are similar, or that they are conjugate (the latter 
term comes from group theory). One also says that M' is the conjugate of 
M by P. 

The equivalence and the similarity of matrices are two equivalence 
relations. They will be studied in Chapter 6. 



l. 2.1 Block Decomposition 

Considering matrices with entries in a ring A does not cause difficulties, as 
long as one limits oneself to addition and multiplication. However, when A 
is not commutative, it is important to choose the formula 

m 

when computing (MM')ik, since this one corresponds to the composition 
law when one identifies matrices with H-linear maps from to H". 

When m = n, the product is a composition law in M„(iC). This space 
is thus a iC-algebra. In particular, it is a ring, and one may consider the 
matrices with entries in i? = M„(itT). Let M G Mpxq(P) have entries 
(one chooses uppercase letters in order to keep in mind that the entries 
are themselves matrices) . One naturally identifies M with the matrix M' G 
^pnxqn{K), whose entry of indices {{i — l)n + k, {j — l)n + /), for i < p, 
j < <7, and k,l <n, is nothing but 

One verifies easily that this identification is an isomorphism between 
Mpxg(S) and M.pnxqn{K) as iL-vector spaces. 

More generally, choosing decompositions n = ni + - • -+nr, m = mi + - • • + 

m, a with nk,mi > 1 , one may associate to every matrix M G M.nxm{K) 
an array M with r rows and s columns whose element of index (k,l) is a 
matrix Mu G M„,, xm,(if). Defining 

Vk = '^nt, pLi = '^mt (i^i = Mi = 0), 
t<.l 

one has by definition 

I <i < Uk,l < j <mi. 

This procedure, which depends on the choice of nk,mi, is called block 
decomposition. 
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Though M is not strictly speaking a matrix (except in the case studied 
previously where the Uk, mi are all equal to each other), one still may define 
the sum and the product of such objects. Concerning the product of M and 
M', we must of course be able to compute the products and thus 

the sizes of blocks must be compatible. One verifies easily that the block 
decomposition behaves well with respect to the addition and the product. 
For instance, if n = ni + ri 2 , m = mi + m 2 and p = pi + p 2 , two matrices 
M, M' of sizes n x m and m xp, with block decomposition have 

a product M" = MM' S M„xp(.ff), whose block decomposition M" is 
given by 

M" = + M,2M'^-. 

A square matrix M, whose block decomposition is the same according to 
rows and columns (that is m^ = Uk, in particular the diagonal blocks are 
square matrices) is said lower block-triangular if the blocks Mki with k < I 
are null blocks. One defines similarly the upper Wocfc-triangular matrices or 
the block- diagonal matrices. 

1.2.2 Transposition 

If M € M„xm(AT), one defines the transposed matrix of M (or simply the 
transpose of M) by 

M = {mji)\.Ci<‘m.l<j<n' 

The transposed matrix has size m x n, and its entries mij are given by 
rhij = mji. When the product MM' makes sense, one has {MM'Y’ = 
(M')^M^ (note that the orders in the two products are reversed). For two 
matrices of the same size, (M + M')^ = M'^ + (M')^. Finally, if a G AT, 
then (aM)^ = a{M"’"). The map M M' defined on M„(AT) is thus linear, 
but it is not an algebra endomorphism. 

A matrix and its transpose have the same rank. A proof of this fact is 
given at the end of this section. 

For every matrix M G M„xm(AT), the products M'^M and MM"’' always 
make sense. These products are square matrices of sizes m x m and n x n, 
respectively. 

A square matrix is said to be symmetric if = M, and skew-symmetric 
if M"’ = —M (notice that these two notions coincide when K has char- 
acteristic 2). When M G M„xm(AT), the matrices M'^'M and MM"’ are 
symmetric. We denote by Sym„(A') the subset of symmetric matrices in 
M„(AT). It is a linear subspace of M„(AT). The product of two symmetric 
matrices need not be symmetric. 

A square matrix is called orthogonal if M’’ M = We shall see in 
Section 2.2 that this condition is equivalent to MM"’ = In. 

If M G M„xm(AT), y G iF™, and x G AT”, then the product x"’ My 
belongs to Mi (AT) and is therefore a scalar, equal to y"’ M"’x. Saying that 
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M = 0 amounts to writing My = 0 for every x and y. If m = n and 
x'^ Mx = 0 for every x, one says that M is alternate. An alternate matrix 
is skew-symmetric, since 

x'^ {M + M'^)y = x'^ My+y'^ Mx = {x+y)'^M{x+y)—x'^Mx — y'^My = 0. 
The converse holds whenever the characteristic of K is not 2, since 
2x^ Mx = x^{M + M^)x = 0. 

However, in characteristic 2 there exist matrices that are skew-symmetric 
but not alternate. As a matter of fact, the diagonal of an alternate matrix 
must vanish, though this need not be the case for a skew-symmetric matrix 
in characteristic 2. 

The interpretation of transposition in terms of linear maps is the 
following. One provides AT" with the bilinear form 

(cc, y) := x'^ y = y'^x = xiyi H h x„y„, 

called the canonical scalar product; one proceeds similarly in AT™. If M € 
M„xm(AT), there exists a unique matrix N e Mmxn(AT) satisfying 

(Mx,y) = (x,Ny), 

for all X G AT™ and y G AT" (notice that the scalar products are defined on 
distinct vector spaces). One checks easily that iV = M^. More generally, if 
A, A are A-vector spaces endowed with nondegenerate symmetric bilinear 
forms, and if m G A(A, A), then one can define a unique u'^ G C{F, A) from 
the identity 

{u{x),y)F = {x, u^{y))E, Vx G E,y G F. 

When A = A™ and A = A" are endowed with their canonical bases and 
canonical scalar products, the matrix associated to is the transpose of 
the matrix associated to u. 

Let A be a field. Let us endow A™ with its canonical scalar product. If 
A is a linear subspace of A™, one defines the orthogonal subspace of A by 

A^ :={xGK^; {x,y) = 0, Vy G F}. 

It is a linear subspace of A™. We observe that for a general field, the 
intersection A n A-*- can be nontrivial, and A™ may differ from A -|- A-*-. 
One has nevertheless 

dim A -|- dim F^ = m. 

Actually, A-*- is the kernel of the linear map T : A™ ^ L{F;K) =: A*, 
defined by T{x){y) = {x, y) for x G A™, y G F. Let us show that T is onto. 
If {/i, . . . , fr} is a basis of A, then every linear form I on A is a map 

3 3 
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Completing the basis of F as a basis of itT™, one sees that I is the restric- 
tion of a linear form L on FT™. Let us define the vector x € FT™ by its 
coordinates in the canonical basis: Xj = L{e^). One has L{y) = {x,y) for 
every y G iL™; that is, I = T{x). Finally, we obtain 

m = dim ker T -|- rk T = dim F'^ + dim F* . 

The dual formulas between kernels and ranges are frequently used. If 
M G M„xm(FT), one has 

FT™ = kerM 0-L FT” = ker(M^) 0-^ i?(M), 

where 0-*- means a direct sum of orthogonal subspaces. We conclude that 
rk M'^ = dim = m — dim = m — dim ker M, 

and finally, that 

rk = rk M. 



1.2.3 Matrices and Bilinear Forms 

Let E, F be two iL-vector spaces. One chooses two respective bases (3 = 
{ei, . . . , e„} and 7 = {/i, . . . , fm}- li B : E x F ^ K is a, bilinear form, 
then 

B{x,y) = ^B{ei,fj)x,yj, 
id 

where the yj are the coordinates of x, y. One can define a matrix M € 
M„xm(Fr) by TTiij = B{ci, fj). Conversely, if M G M„xm(FT) is given, one 
can construct a bilinear form on F x F by the formula 

B{x,y) := x'^My = y^nUjXiyj. 

Therefore, there is an isomorphism between M„xm(FT) and the set of bi- 
linear forms on F X F. One says that M is the matrix of B with respect 
to the bases / 3 , 7. This isomorphism depends on the choice of the bases. 
A particular case arises when F = F" and F = F™ are endowed with 
canonical bases. 

If M is associated to B, it is clear that is associated to the bilinear 
form defined on F x F by 

(y,x) 1-^ B{x,y). 

When M is a square matrix, one may take F = E and 7 = / 3 . In that 
case, M is symmetric if and only if B is symmetric: B(x,y) = B[y,x). 
Likewise, one says that B is alternate if B{x, x) = 0 , that is if M itself is 
an alternate matrix. 
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If B : E X F ^ K is bilinear, one can compare the matrices M and 
M' of B with respect to the bases /?, 7 and Denoting by P,Q the 

change-of-basis matrices of f3 f3' and 7 7 ', one has 

= Bie'iJj) = '^PktqijB{ek,fi) = '^Pkzqij'mki- 

k,l k,l 



Therefore, 



M' = P^MQ. 



When F = E and 7 = /3, 7 ' = /?', the change of basis has the effect of 
replacing M by M' = P"’' MP. In general, M' is not similar to M, though 
it is so if P is orthogonal. If M is symmetric, then M' is too. This was 
expected, since one expresses the symmetry of the underlying bilinear form 

B. 

If the characteristic of K is distinct from 2, there is an isomorphism 
between Sym„(itr) and the set of quadratic forms on itT". This isomorphism 
is given by the formula 

Q(^Oi Cj) 

In particular, Q{ei) = mu. 



1.3 Exercises 

1 . Let G be an JR- vector space. Verify that its complexification is a 
C-vector space and that dime G^ = dimjR G. 

2. Let M G yinycm{K) and M' G Mmxp(.R) be given. Show that 

rk(MM') < min{rk M, rk M'}- 

First show that rk(MM') < rk M, and then apply this result to the 
transpose matrix. 

3. Let RT be a field and let A,B,G be matrices with entries in K, of 
respective sizes n x m, m x p, and p x q. 

(a) Show that xkA + xkB < m + ikAB. It is sufficient to consider 
the case where B is onto, by considering the restriction of A to 
the range of B. 

(b) Show that xkAB + xkBG < rkB + rkABG. One may use 
the vector spaces K^jkoxB and R{B), and construct three 
homomorphisms u, v, w, with v being onto. 

4. (a) Let n, n', m, m' G and let iL be a field. If i? G M„xm(^f) and 

G G yin' yem' {K) , One defines a matrix B ®G & yinn'ycmm'{K), 
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the tensor product, whose block form is 

/ ••• bi^C \ 

B®C= : : . 

^ hnlC ••• bnniC ) 

Show that (B, C) (g> C is a bilinear map and that its range 
spans ~\/L„n'xmm'{K). Is this map onto? 

(b) If p,p' G IN* and D G M.mxp{K), E G M.m'xp'(K), then 
compute {B ®C){D®E). 

(c) Show that for every bilinear form (j) : ’M.nxm{K)x'M.n'xm'{K) — > 
K, there exists one and only one linear form 

L ‘ NL^n' xmm' {N) > K 

such that L{B ® C) = 4>{B, C). 




2 

Square Matrices 



The essential ingredient for the study of square matrices is the determinant. 
For reasons that will be given in Section 2.5, as well as in Chapter 6, it 
is useful to consider matrices with entries in a ring. This allows us to 
consider matrices with entries in Z (rational integers) as well as in K\X] 
(polynomials with coefficients in iF). We shall assume that the ring A of 
scalars is a commutative (meaning that the multiplication is commutative) 
integral domain (meaning that it does not have zero divisors: ab = 0 implies 
either a = 0 or 6 = 0), with a unit denoted by 1, that is, an element 
satisfying lx = xl = x for every x G A. Observe that the ring M„(M) is 
not commutative if n > 2. For instance, 

(0 1\(0 0\(1 0 \ (0 0\(0 0\(0 1 \ 

Voo^Vioy VoOy/^Voiy/ l^iOy/Vooy' 

An element a of A is invertible if there exists b G A such that ab = 1. 
The element b is unique (because A is an integral domain), and one calls it 
the inverse of a, with the notation b = a~^. The set of invertible elements 
of A is a multiplicative group, denoted by A* . One has 

{ab)~^ = b~^a~^ = a~^b~^. 



2.1 Determinants and Minors 

We recall that Sn, the symmetric group, denotes the group of permutations 
over the set {!,... , n}. 
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Let M G M„(A) be a square matrix. Its determinant is defined by 
detM := ^ e(cr)TOi„(i) • • • 

where the sum ranges over all the permutations of the integers 1, . . . ,n. 
We denote by e(cr) = ±1 the signature of a, equal to +1 if a is the product 
an even number of transpositions, and —1 otherwise. Recall that = 

e(cr)e(cr')- 

If M is triangular, then all the products vanish other than the one 
associated with the identity (that is, a{j) = j). The determinant of a 
triangular M is thus equal to the product of diagonal entries mu. In par- 
ticular, det/„ = 1 and detO„ = 0. An analogous calculation shows that 
the determinant of a block triangular matrix is equal to the product of the 
determinants of the diagonal blocks Mjj . 

Since e(cr“^) = e(cr), one has 

det = det M. 

Looking at M as a row matrix with entries in A", one may view the 
determinant as a multilinear form of the n columns of M : 

det M = det . 

This form is alternate: If two columns are equal, the determinant vanishes. 
As a matter of fact, if the tth and the jth columns are equal, one groups the 
permutations pairwise (ct, tct), where t is the transposition (i, j). For each 
pair, both products are equal, up to the signatures, which are opposite; 
their sum is thus zero. Likewise, if two rows are equal, the determinant is 
zero. 

More generally, if the columns of M satisfy a non trivial linear relation 
(oi, ... , a„ not all zero) of linear dependence 

a\Mi -|- • • • -l- anMn — 0 

(that is, if rk M < n), then det M is zero. Let us assume, for instance, that 
oi is nonzero. For j > 2, one has 

det = 0. 

Using the multilinearity, one has thus 

aidetM = det -h ••• -k 

= det = 0. 

Since A is an integral domain, we conclude that det M = 0. 

For a matrix M € M„xm(A), not necessarily square, and p > 1 an integer 
with p < m,n, one may extract a, p x p matrix M' G Mp(A) by retaining 
only p rows and p columns of M. The determinant of such a matrix M' is 
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called a minor of order p. Once the choice of the row indices i\ <•■■< ip 
and column indices ji < • ■ ■ < jp has been made, one denotes by 



M 



i\ i2 • • • ip 

ji h ■■■ jp 



the corresponding minor. A principal minor is a minor with equal row and 
column indices, that is, of the form 



M 



*1 

ii 



*2 

*2 



In particular, the leading principal minor of order p is 

M 



Given a matrix M € M„(A), one associates the matrix M of cofactors, 
defined as follows: its (t, j)-th entry rhij is the minor of order n — 1 obtained 
by removing the ith row and the jth column multiplied by (— 1)*+-^. It is 
also the factor of in the formula for the determinant of M . Finally, we 
define the adjoint matrix adj M by 

adj M ■= M'^. 



Proposition 2.1.1 If M G M„(A), one has 

M(adj M) = (adj M)M = det M • /„. (2-1) 



Proof 

The identity is clear as far as diagonal terms are concerned; it amounts to 
the definition of the determinant (see also below). The off-diagonal terms 
mb of M(adJM) are sums involving on the one hand an index, and on 
the other hand a permutation a G S„. One groups the terms pairwise, 
corresponding to permutations cr and err, where r is the tranposition (i,j). 
The sum of two such terms is zero, so that mb = 0. 

■ 

Proposition 2.1.1 contains the well-known and important expansion for- 
mula for the determinant with respect to either a row or a column. The 
expansion with respect to the tth row is written 

detM = (-l)*+^marhii H h (-l)*+”mi„mi„, 



while the expansion with respect to the ith column is 

detM = (-l)*+^miiTOii H h (-l)*+”m„im™. 



2.1.1 Irreducihility of the Determinant 

By definition, the determinant is a polynomial function, in the sense that 
det M is the value taken by a polynomial Det^ G A[xii, . . . , Xnn] when the 
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Xij ’s are replaced by the scalars mij . We observe that Det a does not really 
depend on the ring A, in the sense that it is the image of Det^ through 
the canonical ring homomorphism Z ^ A. For this reason, we shall simply 
write Det. The polynomial Det may be viewed as the determinant of the 
matrix ^ ■ ■ ■ 7 ^nn])- 

Theorem 2.1.1 The polynomial Det is irreducible in A[xn, . . . 

Proof 

We shall proceed by induction on the size n. If n = 1, there is nothing 
to prove. Thus let us assume that n > 2. We denote by D the ring of 
polynomials in the Xij with (i, j) yf (1, 1), so that A[xn , . . . , Xnn] = 

From the expansion with respect to the first row, we see that Det = xi\P+ 
Q, with P,Q G D. Since Det is of degree one as a polynomial in xn, 
any factorization must be of the form (xnR+ S)T, with R,S,T G D. In 
particular, RT = P. 

By induction, and since P is the polynomial Det of (n — 1) x (n — 1) 
matrices, it is irreducible in E, the ring of polynomials in the Xij’s with 
i,j > 1. Therefore, it is also irreducible in D, since D is the polynomial 
ring E[x\ 2 , • . • , x\n,X 2 i, • ■ • , Xni] - Therefore, we may assume that either R 
or T equals 1. 

If the factorization is nontrivial, then R = 1 and T = P. It follows that 
P divides Det. An expansion with respect to various rows shows similarly 
that every minor of size n — 1, considered as an element of A[xn, . . . , Xnn] , 
divides Det. However, each such minor is irreducible, and they are pairwise 
distinct, since they do not depend on the same set of Xij's. We conclude 
that the product of all minors of size n — 1 divides Det. In particular, the 
degree n of Det is greater than or equal to the degree n^(n — 1) of this 
product, an obvious contradiction. 



2.1.2 The Cauchy-Binet Formula 

In the sequel, we shall use also the following result. 



Proposition 2.1.2 Let B G M„xm(A), C G Mmxz(A), and an integer 
p < n,l be given. Let 1 < ii <•■•< ip < n and 1 < k\ < ■ ■ • < kp < I be 
indices. Then the minor 



{BC) 



is given by the formula 



E 

i<jl<h<--<jp<r 



B 



*1 

ki 



*2 

k2 



( n 


h ■ ■ 


• ] 


\ ■ c i 


J2 • • 


■ jp ] 


V A 


J2 • • 


• jp ) 




k2 ■ ■ 


kp J 
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Corollary 2.1.1 Let b,c G A. If b divides every minor of order p of B 
and if c divides every minor of order p of C , then be divides every minor 
of order p of BC. 

The particular case I = m = n is fundamental: 



Theorem 2.1.2 If B^C G M„(T), then det{BC) = det B ■ det C. 



In other words, the determinant is a multiplicative homomorphism from 
M„(A) to A. 

Proof 

The corollaries are trivial. We only prove the Cauchy-Binet formula. 
Since the calculation of the ith row (respectively the jth column) of BC 
involves only the ith row of B (respectively the jth column of C), one 
may assume that p = n = 1. The minor to be evaluated is then det BC. If 
m < n, there is nothing to prove, since on the one hand the rank of BC 
is less than or equal to m, thus det BC is zero, and on the other hand the 
left-hand side sum in the formula is empty. 

There remains the case m > n. Let us write the determinant of a ma- 
trix P as that of its columns Pj and let us use the multilinearity of the 
determinant: 



det BC = det Cj^iBj^, {BC)2, ■ ■ ■ , {BC)r 






Cjii det I Bji, Cj22Sj2, (SC) 3 , . . . , (BC)r 
ii=i \ i2=i 

= 'y ( Cjji • • • Cj^n det{Bj ^ , . . . , Bj^). 

- ,jn<n 



In the sum the determinant is zero as soon as / i-^- j/ is not injective, 
since then there are two identical columns. If on the contrary j is injective, 
this determinant is a minor of B, up to the sign. This sign is that of the 
permutation that puts ji, . . . ,jp increasing order. Grouping in the sum 
the terms corresponding to the same minor, we find that det BC equals 



E E 

l<ki<-‘<k-n<m, (j^Sn 



1 2 
ki k2 



n 

k-n 



which is the required formula. 



2.2 Invertibility 

Since M„(24) is not an integral domain, the notion of invertible elements 
of M„(24) needs an auxiliary result, presented below. 
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Proposition 2.2.1 Given M G M„(A), the following assertions are 
equivalent: 

1. There exists N G M„(gI) such that MN = In- 

2. There exists N' G M„(gI) such that N'M = In- 

3. det M is invertible. 

If M satisfies one of these equivalent conditions, then the matrices N, N' 
are unique and one has N = N' . 

Definition 2.2.1 One then says that M is invertible. One also says some- 
times that M is nonsingular, or regular. One calls the matrix N = N' the 
inverse of M, and one denotes it by M~^. If M is not invertible, one says 
that M is singular. 

Proof 

Let us show that (1) is equivalent to (3). If MN = In, then detM • 
det IV = 1; hence detM G A*. Conversely, if detM is invertible, 
(detM)“^M^ is an inverse of M by (2.1). Analogously, (2) is equivalent 
to (3). The three assertions are thus equivalent. 

If MN = N'M = In, one has N = {N'M)N = N'{MN) = N'. This 
equality between the left and right inverses shows that these are unique. 

■ 

The set of the invertible elements of M„(A) is denoted by GL„(A) (for 
“general linear group”). It is a multiplicative group, and one has 

(MN)-^ = N-^M~\ (M'=)-i = (M-^)^ (M^)-i = (M-i)^. 

The matrix (M^)“^ is also written M“^. If fc G IN, one writes M“^ = 
(M*)“^ and one has M^ M^ = M-^+* for every j, k € Z. 

The set of the matrices of determinant one is a normal subgroup of 
GL„(A), since it is the kernel of the homomorphism M detM. It is 
called the special linear group and is denoted by SL„(A). 

The orthogonal matrices are invertible, and they satisfy the relation 
M“^ = M^. In particular, orthogonality is equivalent to MM^ = 

The set of orthogonal matrices with entries in a field K is obviously a 
multiplicative group, and is denoted by 0„(AT). It is called the orthogonal 
group. The determinant of an orthogonal matrix equals ±1, since 

1 = det M • det M^ = (det M)^. 

The set SO„(AT) of orthogonal matrices with determinant equal to 1 is 
obviously a normal subgroup of the orthogonal group. It is called the special 
orthogonal group. It is simply the intersection of 0„(AT) with SL„(AT). 

A triangular matrix is invertible if and only if its diagonal entries are 
invertible; its inverse is then triangular of the same type, upper or lower. 
The proposition below is an immediate application of Theorem 2.1.2. 
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Proposition 2.2.2 If M,M' € M„(A) are similar (that is, M' = 
P~^MP with P G GLn(A)J, then 

det M' = det M. 



2.3 Alternate Matrices and the Pfafhan 

The very simple structure of alternate forms is described in the following 
statement. 

Proposition 2.3.1 Let B be an alternate bilinear form on a vector space 
E, of dimension n. Then there exists a basis 

yi^ ■ • ■ j Vk 5 - 2 ^ 1 5 • ■ • , ^n—2k} 

such that the matrix of B in this basis is block-diagonal, equal to 
diag( J, ... , J, 0, . . . ,0), with k blocks J defined by 




Proof 

We proceed by induction on the dimension n. If i? = 0, there is nothing to 
prove. If B is nonzero, there exist two vectors x\,yi such that B{x\, y\) yf 0. 
Multiplying one of them by B{xi,y\)~"^ , one may assume that B{x\, y\) = 
1. Since B is alternate, {xi, yi} is free. Let N be the plane spanned by x\,yi. 
The set of vectors x satisfying B{x,v) = 0 (or equivalently B{v,x) = 0, 
since B must be skew-symmetric) for every in iV is denoted by N-^. The 
formulas 



B{axi byi,xi) = -b, B{axi byi, yi) = a 

show that N n N-^ = {0}. Additionally, every vector x G E can be written 
as X = y n, where n G N and y G N-^ are given by 

n = B{x,yi)xi - B{x,xi)yi, y := x - n. 

Therefore, E = N (B N-^. We now consider the restriction of B to the 
subspace N-^ and apply the induction hypothesis. There exists a basis 
{x 2 ,V 2 , ■ ■ ■ , Xk, yk, zi, . ■ . , z„- 2 k} such that the matrix of the restriction of 
B in this basis is block-diagonal, equal to diag( J, ... , J, 0, . . . ,0), with k— 1 
blocks J, which means that B{xj,yj) = 1 = —B{yj,Xj) and B{u,v) = 0 
for every other choice of u, v in the basis. Obviously, this property extends 
to the form B itself and the basis {xi, yi, ■ ■ ■ , Xk,yk, Zi, . . ■ , Zn- 2 k}- 

■ 

We now choose an alternate matrix M G Mn{K) and apply Proposition 
2.3.1 to the form defined by M. In view of Section 1.2.3, we have the 
following. 




22 2. Square Matrices 



Corollary 2.3.1 Given an alternate matrix M S Mn{K), there exists a 
matrix Q G GL„(itT) such that 

M = g^diag(J,... , ,0)g. (2.2) 

Obviously, the rank of M, being the same as that of the block-diagonal 
matrix, equals twice the number of J blocks. Finally, since det J = 1, we 
have det M = e(det Q)^, where e = 0 if there is a zero diagonal block in the 
decomposition, and e = 1 otherwise. Thus we have proved the following 
result. 

Proposition 2.3.2 The rank of an alternate matrix M is even. The num- 
ber of J blocks in the identity (2.2) is the half of that rank. In particular, 
it does not depend on the decomposition. Finally, the determinant of an 
alternate matrix is a square in K . 

A very important application of Proposition 2.3.2 concerns the Pfaffian, 
whose crude definition is a polynomial whose square is the determinant of 
the general alternate matrix. First of all, since the rank of an alternate 
matrix is even, det M = 0 whenever n is odd. Therefore, we restrict our 
attention from now on to the even-dimensional case n = 2m. Let us consider 
the field F = Q{xij) of rational functions with rational coefficients, in 
n(n — l)/2 indeterminates Xij, i < j. We apply the proposition to the 
alternate matrix X whose (i, z)-entry is 0 and (i, j)-entry (respectively {j, i)- 
entry) is Xij (respectively —Xij). Its determinant, a polynomial in Z[xij], is 
the square of some irreducible rational function f / g, where / and g belong 
to Z[xij]. From det X = /^, we see that g divides / in Z[xij]. But since 



/ and g are coprime, one finds that g is invertible; in other words 5 = ± 1 . 
Thus 

detX = f. (2.3) 

Now let k he & field and let M G Mn{k) be alternate. There exists 
a unique homomorphism from Z[xij] into k sending Xij to m^. From 
equation (2.3) we obtain 

detM = (/(mi 2 , . . . ,TO„_i,„))^. (2.4) 

In particular, if k = Q and M = diag( J, . . . ,J), one has = 1. Up 



to multiplication by ±1, which leaves unchanged the identity (2.3), we 
may assume that / = 1 for this special case. This determination of the 
polynomial / is called the Pfaffian and is denoted by Pf. It may be viewed 
as a polynomial function on the vector space of alternate matrices with 
entries in a given field k. equation (2.4) now reads 

detM = (Pf(M))2. (2.5) 

Given an alternate matrix M G Mn{k) and a matrix Q G Mn{k), we 
consider the Pfaffian of the alternate matrix Q'^ MQ. We first consider the 
case of the field of fractions Q{xij , yij) in the n^-\-n{n—l ) /2 indeterminates 
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Xij (1 < * < J < n) and ytj (1 < < n). Let Y be the matrix whose 

(i, j)-entry is Then, with X as above, 

(Pi(Y^XY)f = detY'^XY = (detT)2detX = (Pf(X) det r)^. 

Since Z[xij, yij] is an integral domain, we have the polynomial identity 

Pf (r^xr) = ePf(X)detr, e=±l. 

As above, one infers that MQ) = ±Pf(M)det(5 for every field k, 

matrix Q G Mn{k), and alternate matrix M G Mn(k). Inspection of the 
particular case Q = In yields e = 1. We summarize these results now. 

Theorem 2.3.1 Let n = 2m be an even integer. There exists a unique 
polynomial Pf in the indeterminates Xij (1 < i < j < n) with integer 
eoefficients such that: 

• For every field k and every alternate matrix M G Mn{k), one has 
detM = Pf(M) 2 . 

• If M = diag(J, ... ,J), then Pf(M) = 1. 

Moreover, if Q G Mn{k) is given, then Pf {Cf" MQ) = Pf(M)det(5. 

We warn the reader that if m > 1, there does not exist a matrix Z G Q[xij] 
such that X = Z^diag(J, ... ,J)Z. The factorization of the polynomial 
det X does not correspond to a similar factorization of X itself. In other 
words, the decomposition X = Q^diag(J, . . . , J)Q in Mn{Q{xij)) cannot 
be written within Mn{Q[xij\). 

The Pfaffian is computed easily for small values of n. For instance, 
Pf(A) = x \2 if n = 2, and Pf = X 12 X 34 — X 13 X 24 + X 14 X 23 if n = 4. 

2.4 Eigenvalues and Eigenvectors 

Let AT be a field and E, F two vector spaces of finite dimension. Let us 
recall that if u : E F is a, linear map, then 

dim E = dim ker u + rku, 

where rku denotes the dimension of u{E) (the rank of u). In particular, if 
u G End(A), then 

u is bijective 4=^ u is injective 4=^ u is surjective. 

However, u is bijective, that is invertible, in End(if), if and only if its 
matrix M in some basis fl is invertible, that is if its determinant is nonzero. 
As a matter of fact, the matrix of u~^ is M~^\ the existence of an inverse 
(either that of M or that of u) implies that of the other one. Finally, if 
M G M„(AT), then detM yf 0 is equivalent to 

VA G AT”, MX = 0 



A = 0. 
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In other words, 

det M = 0 {3X £K'^,X ^ 0, MX = 0). 

More generally, since MX = XX (A G K) can also be written {XIn—M)X = 
0, one sees that det(A/„ — M) is zero if and only if there exists a nonzero 
vector in K" such that MX = XX. One then says that A is an eigenvalue 
of M in K, and that X is an eigenvector associated to A. An eigenvector 
is thus always a nonzero vector. The set of the eigenvalues of M in AT is 
called the spectrum of M and is denoted by Sp;^;(M). 

A matrix in M„(AT) may have no eigenvalues in K, as the following 
example demonstrates, with K = M: 




In order to understand in detail in the structure of a square matrix M G 
M„(AT), one is thus led to consider M a,s a matrix with entries in K. One 
then writes Sp(M) instead of Sp;^(M), and one has Sp^(M) = ATnSp(M), 
since the eigenvalues are characterized by det(A/„ — M) = 0, and this 
equality has the same meaning in K as in K when X G K . 



2.5 The Characteristic Polynomial 

The previous calculations show that the eigenvalues oi M G M„(AT) are 
the roots of the polynomial 

Pm{X) :=det(A/„-M). 

Let us observe in passing that if X is an indeterminate, then XIn — M G 
M„(AT(A)). Its determinant Pm is thus well-defined, since K(X) is a 
commutative integral domain with a unit element. One calls Pm the charac- 
teristic polynomial of M. Substituting 0 for X, one sees that the constant 
term in Pm is simply (— l)"detM. Since the term corresponding to the 
permutation cr = id in the computation of the determinant is of degree 
n (it is rii(^ ~ Wii)) and since the products corresponding to the other 
permutations are of degree less than or equal to n — 2, one sees that Pm is 
of degree n, with 

Pm{X) = A” - + • • • + (-1)” det M. 



n 



E 



mu 






The coefficient 
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is called the trace of M and is denoted by Tr M. One has the trivial formula 
that \i N € M„xm(-ff) and P G M.mxn{K), then 

Tr(iVP) = Tr(PiV). 

For square matrices, this identity also becomes 

Tr[lV,P] = 0. 

Since Pm possesses n roots in K, counting multiplicities, one sees that 
a square matrix has always at least one eigenvalue, which, however, does 
not necessarily belong to K. The multiplicity of A as a root of Pm is called 
algebraic multiplicity of the eigenvalue A. The geometric multiplicity of A is 
the dimension of ker(A/„— M) in PT". The sum of the algebraic multiplicities 
of the eigenvalues of M (considered in K) is n, the size of the matrix. An 
eigenvalue of algebraic multiplicity one (that is, a simple root of Pm) is 
called simple. It is geometrically simple if its geometric multiplicity equals 
one. 

The characteristic polynomial is a similarity invariant, in the following 
sense: 

Proposition 2.5.1 If M and M' are similar, then Pm = Pm' ■ In 
particular, det M = det M' and Tr M = Tr M' . 

The proof is immediate. One deduces that the eigenvalues and their 
algebraic multiplicities are similarity invariants. This is also true for the 
geometric multiplicities, by a direct comparison of the kernel of A/„ — 
M and of A/„ — M' . Furthermore, the expression obtained above for the 
characteristic polynomial provides the following result. 

Proposition 2.5.2 The product of the eigenvalues of M (considered in 
K ), counted with their algebraic multiplicities, is det M. Their sum is Tr M. 

Let pL be the geometric multiplicity of an eigenvalue A of M. Let us choose 
a basis 7 of ker(A/„ — M), and then a basis of (3 of K that completes 7 . 
Using the change-of-basis matrix from the canonical basis to (3, one sees 
that M is similar to a matrix M' = P~^MP, whose p, first columns have 
the form 




A direct calculation shows then that (A — A)^ divides Pm', that is, Pm- The 
geometric multiplicity is thus less than or equal to the algebraic multiplicity. 

The characteristic polynomials of M and are equal. Thus, M and 
M'^ have the same eigenvalues. We shall show in Chapter 6 a much deeper 
result, namely M and are similar. 

The main result concerning the characteristic polynomial is the Cayley- 
Hamilton theorem: 
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Theorem 2.5.1 Let M € M„(itT). Let 

Pm{X) = X” + + • • • + a„ 

he its characteristic polynomial. Then the matrix 



equals 0„. 



One also writes Pm{M) = 0. Though this formula looks trivial (obviously, 
det(M/„ — M) = 0), it is not. Actually, it must be understood in the 
following way. Let us consider the expression A/„ — M as a matrix with 
entries in AT[A]. When one substitutes a matrix N for the indeterminate 
X in XIn — M, one obtains a matrix of M„(A), where A is the subring 
of M„(AT) spanned by /„ and N (one denotes it by K{N)). The ring A is 
commutative (but is not an integral domain in general), since it is the set 
of the q{N) for q G AT[X]. Therefore, 






Pm(X) = 



TTlij In 



N TJlnnln / 



The Cayley-Hamilton theorem expresses that the determinant (which is 
an element of M„(AT), rather than of K) of this matrix is zero. 

Proof 

Let R G M„(AT(A)) be the matrix XIn — M, and let S be the adjoint of 
R. Each Sij is a polynomial of degree less than or equal to n — 1, because 
the products arising in the calculation of the cofactors involve n — 1 linear 
or constant terms. Thus we may write 



where Sj G M„(AT). Let us now write RS = (det i?)/„ = PM{X)In' 
{XIn - M)(,SoA"-i + • • • + 5„_i) = (A” + aiA"-i + • • • + a„)J„. 
Identifying the powers of X, we obtain 

Sf) — Iji 1 



Si - MSo 


II ■ 


^ - MSj-i 


— (Zj In , 


- MSn-2 


— ^n—lli 


-MSn-l 


— ^nlri' 
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Let us multiply these rows by the powers of M, beginning with M" and 
ending with = I„. Summing the obtained equalities, we obtain the 
expected formula. 

■ 

For example, every 2x2 matrix satisfies the identity 
- (Tr M)M + (det M)l 2 = 0. 

2.5.1 The Minimal Polynomial 

For a square matrix M e M„(iF), let us denote by Jm the set of polyno- 
mials Q € K\X] such that Q{M) = 0. It is clearly an ideal of K[X]. Since 
K[X] is Euclidean, hence principal (see Sections 6.1.1 and 6.1.2), there ex- 
ists a polynomial Qm such that Jm = K[X]Qm- In other words, Q{M) = 0 
and Q G K[X] imply Qm\Q- Theorem 2.5.1 shows that the ideal Jm does 
not reduce to {0}, because it contains the characteristic polynomial. Hence, 
Qm ^ 0 and one may choose it monic. This choice determines Qm in a 
unique way, and one calls it the minimal polynomial of M. It divides the 
characteristic polynomial. 

Contrary to the case of the characteristic polynomial, it is not immedi- 
ate that the minimal polynomial is independent of the field in which one 
considers Jm (note that we consider only fields that contain the entries of 
M). We shall see in Section 6.3.2 that if L is a field containing K, then the 
minimal polynomials of M in K\X] and L\X] are the same. This explains 
the terminology. 

Two similar matrices obviously have the same minimal polynomial, since 
Q{P-^MP) = P-^Q{M)P. 

If A is an eigenvalue of M, associated to an eigenvector X, and if q G 
K[X], then q{X)X = q{M)X. Applied to the minimal polynomial, this 
equality shows that the minimal polynomial is divisible by A — A. Hence, 
if Pm splits over K in the form 

r 

\{{x-\,rp 

the \j all being distinct, then the minimal polynomial can be written as 

r 

Uix-x^rp 

1=1 

with 1 < mj < Uj. In particular, if every eigenvalue of M is simple, the 
minimal polynomial and the characteristic polynomial are equal. 

An eigenvalue is called semi-simple if it is a simple root of the minimal 
polynomial. 
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2.6 Diagonalization 

If A G X is an eigenvalue of M, one calls the linear subspace Ek{X) = 
ker(M — A/„) in iG” the eigenspace associated to A. It is formed of eigen- 
vectors associated to A on the one hand, and of the zero vector on the other 
hand. Its dimension is nonzero. If L is a field containing K (an “extension” 
of K), then dimK Ek{X) = diniL El{X). This equality is not obvious. It 
follows from the third canonical form with Jordan blocks, which we shall 
see in Section 6.3.3. 

If Ai, . . . ,Xr are distinct eigenvalues, then the eigenspaces are in direct 
sum. That is, 

(xi G Ek{Xi), ... ,Xr & EK{Xr),Xi~\ \~Xr = 0) {xi = ■ ■ ■ = Xr = 0). 

As a matter of fact, if there existed a relation x\ + • • • + Xg = 0 where 
x\, . . . ,Xs did not vanish simultaneously (we say that it has length s), one 
could choose such a relation of minimal length r. One then would have 
r >2. Multiplying this relation by M — Xrin, one would obtain 

(Ai — Xr)xi (A, — 1 — Xr)xr—1 = 0, 

which is a nontrivial relation of length r — 1 for the vectors {Xj — Xr)xj G 
Ex{Xj). This contradicts the minimality of r. 

If all the eigenvalues of M are in K and if the algebraic and geometric 
multiplicities coincide for each eigenvalue of M, the sum of the dimensions 
of the eigenspaces equals n. Since these linear subspaces are in direct sum, 
one deduces that 

= E{Xi)®---®E{Xr). 

Thus one may choose a basis of AT" formed of eigenvectors. If P is the 
change-of-basis matrix from the canonical basis to the new one, then 
M' = P~^MP is diagonal, and its diagonal terms are the eigenvalues, 
repeted with their multiplicities. One says that M is diagonalizable in K. 
A particular case is that in which the eigenvalues of M are in K and are 
simple. 

Conversely, if M is similar, in M„(AT), to a diagonal matrix M' = 
P~^MP, then P is a change-of-basis matrix from the canonical basis to 
an eigenhasis (that is, a basis composed of eigenvectors) of M. Hence, M 
is diagonalizable if and only if the algebraic and geometric multiplicities of 
each eigenvalue coincide. 

Two obstacles could prevent M from being diagonalizable in K. The 
first one is that an eigenvalue of M does not belong to K. One can always 
overcome this difficulty by moving towards M„(AT). The second one is more 
serious: In K, the geometric multiplicity of an eigenvalue can be strictly 
less than its algebraic multiplicity. For instance, a triangular matrix whose 
diagonal vanishes has only one eigenvalue, zero, of algebraic multiplicity 
n. Such a matrix is nilpotent. However it is diagonalizable only if it is 0„, 
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since M = PM' P ^ and M' = 0 imply M = 0. Hence, 




is not diagonalizable. 



2.7 Trigonalization 

Let us begin with an application of the Cayley-Hamilton theorem. 

Proposition 2.7.1 Let M G M„(iL) and let Pm be its characteristic poly- 
nomial. If Pm = QR with coprime factors Q,R G then iL" = E®F, 

where E,F are the ranges of Q{M) and R{M), respectively. Moreover, one 
has E = ker F = ker(5(M). 

More generally, if Pm = Ri'''Rs, where the Rg are coprime, one has 
iL" = El (B ■■■ (B Eg with Ej = ker Rj{M). 

Proof 

It is sufficient to prove the first assertion. From Bezout’s theorem, there 
exists Ri,Qi G K[X] such that RRi + QQi = 1. Hence, every x G IF" can 
be written as a sum y -B z with y = Q{M){Qi{M)x) G E, and similarly 
0 = R{M){Ri{M)x) G F. Hence K'' = E + F. 

Furthermore, for every y € E, the Cayley-Hamilton theorem says that 
R{M)y = 0. Likewise, z G F implies Q{M)z = 0. If x G if n F, one has 
thus R{M)x = Q{M)x = 0. Again using Bezout’s theorem, one obtains 
X = 0. This proves AT" = E (B F. 

Finally, E C ker R{M). Since these two vector spaces have the same 
dimension (namely n — dimF), they are equal. 

■ 

If K is algebraically closed, we can split Pm in the form 
Pm{X)= n (^-Ar- 

AeSp(M) 

From Proposition 2.7.1 one has AT" = (BxE\, where E\ = ker(M — A/)"^ 
is called a generalized eigenspace. Choosing a basis in each E\, we obtain a 
new basis B of AT". If P is the matrix of the linear transformation from the 
canonical basis to B, the matrix PMP~^ is block-diagonal, because each 
E\ is stable under the action of M : 

PMP-i =diag(... ,Ma,...). 

The matrix M\ is that of the restriction of M to E\. Since E\ = ker(M — 
A/)”^, one has {M\ — A/)”^ = 0, so that A is the unique eigenvalue of M\. 




30 2. Square Matrices 



Let us define N\ = M\ — XInxj which is nilpotent. Let us also write 
D' = diag(... 

N' = diag(... 

and then D = P~^D'P, N = P~^N'P. The matrices D',N' are respec- 
tively diagonal and nilpotent. Moreover, they commute with each other: 
D'N' = N'D' . One deduces the following result. 

Proposition 2.7.2 If K is algebraically closed, every matrix M G M„(iL) 
decomposes as a sum M = D+N, where D is diagonalizahle, N is nilpotent, 
DN = ND, and Sp(D) = Sp(M). 

Let us continue this analysis. 

Lemma 2.7.1 Every nilpotent matrix is similar to a strictly upper 
triangular matrix (and also to a strictly lower triangular one). 

Proof 

Let us consider the nondecreasing sequence of linear subspaces Ek = 
ker Since Eq = {0} and Er = iL" for a suitable r, one can find a basis 
. . . , a;”} of iL” such that {a;^, . . . , x^} is a basis of Ek if j = dim Ek 
(use the theorem that any linearly independent set can be enlarged to 
a basis). Since N{Ek+i) = Ek, Nx^ G Ek- If P is the change-of-basis 
matrix from this basis to the canonical one, then PNP~^ is strictly upper 
triangular. 

■ 

Let us return to the decomposition PMP~^ = D' + N' above. Each N\ 
can be written, from the lemma, in the form Rf^TxRx, where Tx is strictly 
upper triangular. Then Rx{Dx + Nx)Rf^ = Dx + Tx is triangular. Let us 
set 

i? = diag(. .. ,Rx,...). 

Then {RP)M{RP)~^ is block-diagonal, with the diagonal blocks upper 
triangular, and hence this matrix is itself upper triangular. 

Theorem 2.7.1 If K is algebraically closed, then every square matrix is 
similar to a triangular matrix (one says that it is trigonalizable) . 

More generally, if the characteristic polynomial of M G M„(iG) splits as 
the product of linear factors, then M is trigonalizable. 

A direct proof of this theorem that does not use the three previous 
statements is possible. Its strategy is used in the proof of Theorem 3.1.3 



2.8 Irreducibility 

A square matrix A is said reducible if there exists a nontrivial partition 
{1, ... , n} = / U J such that {i,j) G I x J implies = 0. It is irreducible 
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otherwise. Saying that a matrix is reducible is equivalent to saying that 
there exists a permutation matrix P such that PAP~^ is of block-triangular 
form 

B C \ 

Op,„_p D ) ’ 

with l<p<n— l.Asa matter of fact, P is the matrix of the transforma- 
tion from a basis 7 to the canonical one, 7 being obtained by first writing 
the vectors with j G J, and then those with j G I. Working in the new 
basis amounts to decomposing the linear system Ax = b into two subsys- 
tems Dz = d and By = c — Cz, which are to be solved successively. The 
spectrum of A is the union of those of B and D, so that many interesting 
questions concerning square matrices reduce to questions about irreducible 
matrices. 

We shall see in the exercises a characterization of irreducible matrices in 
terms of graphs. Here is a useful consequence of irreducibility. 

Proposition 2.8.1 Let M G M„(iG) he an irreducible matrix such that 
i > j + 2 implies mij = 0. Then the eigenvalues of M are geometrically 
simple. 

Proof 

The hypothesis implies that all entries mi+iy are nonzero. If A is an eigen- 
value, let us consider the matrix N G M„_i(A'), obtained from M — XIn 
by deleting the first row and the last column. It is a triangular matrix, 
whose diagonal terms are nonzero. It is thus invertible, which implies 
rk(M — XIn) = n — 1. Hence ker(M — A/„) is of dimension one. 



2.9 Exercises 

1. Verify that the product of two triangular matrices of the same type 
(upper or lower) is triangular, of the same type. 

2. Prove in full detail that the determinant of a triangular matrix (re- 
spectively a block-triangular one) equals the product of its diagonal 
terms (respectively the product of the determinants of its diagonal 
blocks). 

3. Find matrices M,N G M2(iF) such that MN = O2 and NM yf O2. 
Such an example shows that MN and NM are not necessarily similar, 
though they would be in the case where M or iV is invertible. 

4. Characterize the square matrices that are simultaneously orthogonal 
and triangular. 
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5. One calls any square matrix M satisfying = M a, projection 
matrix, or projector. 

(a) Let P G M„(iL) be a projector, and let E = kerP, F = ker(/„ — 
P) . Show that iL" = E (B F. 

(b) Let P, Q be two projectors. Show that {P — commute with 
P and with Q. Also, prove the identity 



6. Let M be a square matrix over a field K, which we write blockwise 
as 



M = 



A B \ 
C D )' 



The formula det M = det(AP — PC) is meaningless in general, except 
when A, P, C, D have the same size. In that case the formula is false, 
with the exception of scalar blocks. Compare with Schur’s formula 
(Proposition 8.1.2). 



7. If A, B,C,D € Mm(AT) and if AC = CA, show that the determinant 
of 



M = 



A B \ 
CD) 



equals det(AP — CP). Begin with the case where A is invertible, by 
computing the product 



^711 dm 

-C A 



M. 



Then apply this intermediate result to the matrix A— z In, with z € K 
a suitable scalar. 

Compare with the previous exercise. 



8. Verify that the inverse of a triangular matrix, whenever it exists, is 
triangular of the same type. 



9. Show that the eigenvalues of a triangular matrix are its diagonal 
entries. What are their algebraic multiplicities? 

10. Let A G M„(AT) be given. One says that a list ( 01 ^( 1 ),... ,ancr(n)) 
is a diagonal of A if ct is a permutation (in that case, the diagonal 
given by the identity is the main diagonal). Show the equivalence of 
the following properties. 

• Every diagonal of A contains a zero element. 

• There exists a null matrix extracted from A of size k x I with 
k + I > n. 
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11. Compute the number of elements in the group GL2(^/2^). Show 
that it is not commutative. Show that it is isomorphic to the 
symmetric group Sm, for a suitable integer m. 



12. If (oo,... ,a„-i) € C" is given, one defines the circulant matrix 
circ(oo, . . . , a„_i) e M„(C') by 



circ(oo, . . 



j ^n— l) ■ — 



ao 


ai 


an- 


^n—l 


ao 








al 


ai 




On-l OO 



\ 

/ 



We denote by C„ the set of circulant matrices. Obviously, the ma- 
trix circ(l, 0, 0, . . . ,0) is the identity. The matrix circ(0, 1,0,... , 0) 
is denoted by tt. 



(a) Show that C„ is a subalgebra of M„(C), equal to C[7r]. Deduce 
that it is isomorphic to the quotient ring C[X]/(X” — 1). 

(b) Let C be a circulant matrix. Show that C*, as well as P{C), is 
circulant for every polynomial P. If C is nonsingular, show that 
C~^ is circulant. 

(c) Show that the elements of are diagonalizable in a common 
eigenbasis. 

(d) Replace C by any field K. If K contains a primitive nth root to 
of unity (that is, w" = 1, and w™ = 1 implies m G nZ), show 
that the elements of C„ are diagonalizable. 

Note: A thorough presentation of circulant matrices and 
applications is given in Davis’s book [12]. 

(e) One assumes that the characteristic of K divides n. Show that 
Cn contains matrices that are not diagonalizable. 



13. Show that the Pfaffian is linear with respect to any row or column 
of an alternate matrix. Deduce that the Pfaffian is an irreducible 
polynomial in Z[xij]. 

14. (Schur’s Lemma). 

Let k be an algebraically closed field and S a subset of M„(fc). As- 
sume that the only linear subspaces of fc" that are stable under every 
element of S are {0} and fc" itself. Let A G M„(fc) be a matrix that 
commutes with every element of S. Show that there exists c G k such 
that A = cin- 



15. (a) Show that A e M„(AT) is irreducible if and only if for every pair 

(j, k) with 1 < j, k < n, there exists a finite sequence of indices 
j = li,. .. ,lr = k such that yf 0. 

(b) Show that a tridiagonal matrix A G M„(AT), for which none of 
the aj j+i’s and Oj+i^j’s vanish, is irreducible. 
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16. Let A € M„(fc) {k = M or C) be given, with minimal polynomial q. 
If X G fc”, the set 

Ix ■= {p e k[x] I p{A)x = 0} 
is an ideal of fc[X], which is therefore principal. 

(a) Show that Ix yf (0) and that its monic generator, denoted by 
Px, divides q. 

(b) One writes rj instead of Px when x = eP Show that q is the 
least common multiple of ri, . . . , r„. 

(c) If p G k[X], show that the set 

Vp := {x G fc” Ipx G (p)} 

(the vectors x such that p divides Px) is open. 

(d) Let X G A:" be an element for which px is of maximal degree. 
Show that Px = q- Note: In fact, the existence of an element x 
such that Px equals the minimal polynomial holds true for every 
field k. 



17. Let fc be a field and A G M„xm(fc), B G M.jnxn{k) be given. 



(a) Let us define 



M = 



XIn A 
B XI„ 



Show that X™ det M = X" det(X^ — BA) (search for a lower 
triangular matrix M' such that M'M is upper triangular). 

(b) Find an analogous relation between det(X^/„ — AB) and det M. 
Deduce that X^Pba{X) = X"^Pab{X). 

(c) What do you deduce about the eigenvalues of A and of B1 



18. Let fc be a field and 9 : M„(fc) ^ fc a linear form satisfying 6{AB) = 
9{BA) for every A,Bg M„(fc). 

(a) Show that there exists a G k such that for all X, X G fc", one 
has 9{XY'^) = aJ2j xjyj. 

(b) Deduce that 0 = aTr. 



19. Let An be the ring X[Xi,... ,X„] of polynomials in n variables. 
Consider the matrix M G M„(A„) defined by 



/ 



M = 



1 

Xi 

X? 



V ^r- 



1 

Xn 

Xl 



Xf 






) 



Let us denote by A(Xi, . . . , X„) the determinant of M. 
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(a) Show that for every i ^ j, the polynomial Xj — Xi divides A. 

(b) Deduce that 

A = an(A, -A,), 

i<j 

where a G K. 

(c) Determine the value of a by considering the monomial 

n 

(d) Redo this analysis for the matrix 



( • ■ 









where pi, ■ ■ ■ ,Pn are nonnegative integers. 

20. Deduce from the previous exercise that the determinant of the 
Vandermonde matrix 



( 1 



1 \ 



ai 



d\ , . . . , (l-n G K •) 



••• ar^/ 



is zero if and only if at least two of the Oj’s coincide. 

21. A matrix A € M„(iR) is called a totally positive matrix when all 
minors 

^ / *1 *2 • • • ip 
\ Ji J2 ■ ■ ■ jp 

with l<p<n, l<ii<---<Zp<n and 1 < ji < • • • < jp < n are 
positive. 

(a) Prove that the product of totally positive matrices is totally 
positive. 

(b) Prove that a totally positive matrix admits an LU factorization 
(see Chapter 8), and that every “nontrivial” minor of L and U 
is positive. Here, “nontrivial” means 



*1 *2 
jl J2 



Jp 



with 1 < p < n, 1 < zi < • • • < Zp < n, 1 < jl < • • • < Jp < /, 
and is > js for every s. For U, read is < js instead. Note: One 
says that L and U are triangular totally positive. 
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(c) Show that a Vandermonde matrix (see the previous exercise) is 
totally positive whenever 0 < oi < • • • < a„. 

22. Multiplying a Vandermonde matrix by its transpose, show that 

/ n Si ••• s„_i \ 

det si S 2 . . = 

: . ■ ■ : i<j 

V Sn-l S2n-2 ) 

where s, := of + • • • + a* . 

23. The discriminant of a matrix A G Mn{k) is the number 

d{A) := 

i<j 

where Ai , . . . , A„ are the eigenvalues of A, counted with multiplicity. 

(a) Verify that the polynomial 

A(Vi,... ,V„) :=Y[{Xj-X,f 

i<j 

is symmetric. Therefore, there exists a unique polynomial Q G 
^[Yi, . . . , Yn] such that 

A = Q(ai,... 

where the aj’s are the elementary symmetric polynomials 
(7i = Ai + • • • + Xn, ■ ■ ■ ,cr„ = Xi ■ ■ ■ X„. 

(b) Deduce that there exits a polynomial D G Z[xij] in the indeter- 
minates Xij, 1 < J < n, such that for every k and every square 
matrix A, 

d(j4) — D(g,ii, G.12, ■ ■ ■ , ann) • 

(c) Consider the restriction Ds of the discriminant to symmetric 

matrices, where xji is replaced by Xij whenever i < j. Prove that 
Ds takes only nonnegative values on Show, however, 

that Ds is not the square of a polynomial if n > 2 (consider first 
the case n = 2). 
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24. Let P G k\X] be a polynomial of degree n that splits completely in 
k. Let Bp be the companion matrix 



Bp:= 



( ^ 

1 

0 



V 0 



0 \ 



0 : 

0 1 -ai / 



Find a matrix H G M„(/c), whose transpose is of Vandermonde type, 
such that 



HBp = diag(Ai, . . . , A„)iL. 

This furnishes a direct proof of the fact that when the roots of P are 
simple, Bp is diagonalizable. 

25. (E. Formanek [14]) 

Let fc be a field of characteristic 0. 

(a) Show that for every A,B,Cg M 2 (fc), 

[[A,Bf,C]=0. 

Hint: use the Cayley-Hamilton theorem. 

(b) Show that for every M, N G M 2 (A:), 

MN + NM - Tr{M)N - Tr{N)M+ 

{Tr{M)Tr{N)-Tr{MN))l 2 = 0. 

One may begin with the case M = N and recognize a classical 
theorem, then “bilinearize” the formula. 

(c) If 7T G Sr {Sr is the symmetric group over {1,... ,r}), one 
defines a map T^- : M 2 (/c)’' ^ fc in the following way. One de- 
composes 7 T as a product of disjoint cycles, including the cycles 
of order one, which are the fixed points of tt: 

7T= (oi,... ,afcj( 6 i,... ,bk2)--- ■ 

One sets then 

T^(7Vi, ...,Nr)= Tr{Na, ■ ■ ■ ) Tv{m, • • • • • • 

(note that the right-hand side depends neither on the order of 
the cycles in the product nor on the choice of the first index 
inside each cycle, because of the formula Tr(AB) = Tr(BAl)). 
Show that for every Vi, 7 V 2 , V 3 e M 2 (fc), one has 

^ e{7T)T^{Ni,N2,N3) = 0. 

TreS'3 
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(d) Generalize this result to M„(fc): for every Ni,... ,Nn+i € 
M„(A:), one has 

^ eWT,(iVi,...,7V„+i) = 0. 

7I’G<S'ti + 1 

Note: Polynomial identities satisfied by every nxn matrix have 
been studied for decades. See [15] for a thorough account. One 
should at least mention the theorem of Amitsur and Levitzki: 

Theorem 2.9.1 Consider the free algebra Z[x \, ... ,Xr] (where 
x\,... ,Xr are noncommuting indeterminates) define the stan- 
dard polynomial Sr by 

Sri^Xi , . . . , X 7 .) ^ ^ ‘ ‘ ' ^ 7 r(r) ■ 

TTC S r 

Then, given a commutative ring A, one has the polynomial 
identity 

S2n{Ql, • ■ • , Q2n) = 0„, VQi, . . . , Q2n € M„(A). 

26. Let k he & field and let A e M„(fc) be given. For every set J C 
{1, . . . ,n}, denote by Aj the matrix extracted from A by keeping 
only the indices i,j G J. Hence, Aj G M.p{k) for p = card J. Let 
A G fc. 

(a) Assume that for every J whose cardinality is greater than or 
equal to n — p, A is an eigenvalue of Aj. Show that A is an 
eigenvalue of A, of algebraic multiplicity greater than or equal 
to p-|-l (express the derivatives of the characteristic polynomial). 

(b) Conversely, let q be the geometric multiplicity of A as an eigen- 
value of A. Show that if card J > n — q, then A is an eigenvalue 
of Aj. 

27. Let A G M„(A:) and I G IN he given. Show that there exists a poly- 
nomial qi G k[X], of degree at most n — 1, such that A’- = qi{A). If 
A is invertible, show that there exists n G k[X], of degree at most 
n — 1, such that A~’- = rfiA). 

28. Let A: be a field and A,Bg M„(fc). Assume that A yf ^ for every 
A G Sp A, /X G Sp H. Show, using the Cayley-Hamilton theorem, 
that the linear map M 1 — > AM — MB is an automorphism of M„(fc). 

29. Let k he a field and {Mjk)i<j,k<n a set of matrices of M„(fc), at 
least one of which is nonzero, such that MijMki = SjMu for all 
1 < i,j, k,l < n. 

(a) Show that none of the matrices Mjk vanishes. 

(b) Verify that each Mu is a projector. Denote its range by Ei. 
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(c) Show that Ei,. . . ,En are in direct sum. Deduce that each Ej 
is a line. 

(d) Show that there exist generators ej of each Ej such that Mj^ei = 

(e) Deduce that every algebra automorphism of M„(fc) is interior: 
For every a G AutM„(fc), there exists P G GL„(fc) such that 
a{M) = P~^MP For every M G M„(A:). 




3 

Matrices with Real or Complex Entries 



Definitions 

A square matrix M G M„(iR) is said to be normal if M and M'^ commute: 
M'^M = mm"’". The real symmetric, skew-symmetric, and orthogonal 
matrices are normal. 

In considering matrices with complex entries, a useful operation is com- 
plex conjugation z i->- z. One denotes by M the matrix obtained from M 
by conjugating the entries. We then define the Hermitian adjoint matrix^ 
M* by 

M* := {Mf = W. 

One therefore has m*j = rnjl and det M* = det M. The map M M* 
is an anti-isomorphism, which means that it is antilinear (meaning that 
(AM)* = AM*) and satisfies, moreover, the product formula 

(MN)* = N*M*. 

When a square matrix M G M„(C) is invertible, then {M*)~’ = {M~’)* . 
This matrix is sometimes denoted by M~* . 

One says that a square matrix M G M„(C) is Hermitian if M* = M and 
skew- Hermitian if M* = — M. If M G M„xm(C')) the matrices MM* and 



^We warn the reader about the possible confusion between the adjoint and the Her- 
mitian adjoint of a matrix. One may remark that the Hermitian adjoint is defined for 
every rectangular matrix with complex entries, while the adjoint is defined for every 
square matrix with entries in a commutative ring. 
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M*M are Hermitian. We denote by H„ the set of Hermitian matrices in 
M„(C). It is an IR-linear subspace of M„(C), though it is not a C-linear 
subspace, since iM is skew-Hermitian when M is Hermitian. 

A square matrix M G M„(C) is said to be unitary if M*M = In- Since 
this means that M is invertible, with inverse M*, and since the left and 
the right inverses are equal, an equivalent criterion is MM* = In- The 
set of unitary matrices in M„(C') forms a multiplicative group, denoted 
by U„. Unitary matrices satisfy |detM| = 1, since detM*M = jdetMp 
for every matrix M- The set of unitary matrices whose determinant equals 
1, denoted by SU„ is obviously a normal subgroup of U„. Finally, M is 
said to be normal if M and M* commute: MM* = M*M- The Hermitian, 
skew-Hermitian, and unitary matrices are normal. 

Observe that the real orthogonal (respectively symmetric, skew-sym- 
metric) matrices are unitary (respectively Hermitian, skew-Hermitian). 
Conversely, if M is real and either unitary, symmetric, or skew-symmetric, 
then M is either orthogonal, Hermitian, or skew-Hermitian. 

A sesquilinear form on a complex vector space is a map 

(x,y) 1 -^ {x,y), 

linear in x and satisfying 

{y,x) = {x,y). 

It is thus antilinear in y\ 

{x,Xy) = Mx,y). 

When y = X, {x, y) = {x, x) is a real number. The map x i-^- (x, x) is called 
a Hermitian form. The correspondence between sesquilinear and Hermitian 
forms is one-to-one. 

Given a matrix M G M„(C), the form 

{x,y) ^ mjkXjVk, 

defined on C" x C”, is sesquilinear if and only if M is Hermitian. It fol- 
lows that there is an isomorphism between the sets of Hermitian matrices, 
Hermitian, and sesquilinear forms on C". As a matter of fact, a Hermitian 
form can be written in the form 

X 1 -^ mjkXjXk- 

i,k 

The kernel of a Hermitian or a sesquilinear form is the set of vectors 
X G E such that (x, y) = 0 for every y G E- It equals the set of vectors 
y G E such that (x,y) =0 for every x G E- If E = C”, it is also the kernel 
of M^, where M is the (Hermitian) matrix associated to the Hermitian 
form. One says that the Hermitian form is degenerate if its kernel does not 
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reduce to {0}. When E = C?", this amounts to detM = 0. One says that 
the form is nondegenerate otherwise. 

If both E and E are endowed with nondegenerate sequilinear forms 
and respectively, and if u G C{E,E), one defines u* by the formula 

{u*{x),y)E = {x,u{y))F, yx G F, y G E. 

The map u u* is an K-isomorphism from C(E,F) onto C{F,E), and 
one has (Xu)* = Xu*, (u*)* = u. When E = C" and F = C™ are endowed 
with the canonical sesquilinear forms x\yi + • • • , the matrix associated 
to u* is simply the Hermitian adjoint of the matrix associated to u. The 
canonical Hermitian form over C" is positive definite: {x, x) > 0 if x yf 0. It 
allows us to define a norm by ||x|| = yj {x, x). Identifying C” with column 
vectors, one also defines ||X|| = V X*X if X G M„xi(C!')- This norm will 
be denoted by 11-112 in Chapter 4. A matrix is unitary if and only if it is 
associated with an isometry of C": 

||'u(x)|| = ||x||, Vx G C". 

More generally, let M be a Hermitian matrix and (-, •) the form that it 
defines on C". One says that M is positive definite if (x,x) > 0 for ev- 
ery X yf 0. Again, (x, x) is a norm on C". We shall denote by HPD„ 
the set of the positive definite Hermitian matrices; it is an open cone in 
H„. Its closure consists of the Hermitian matrices M that define a posi- 
tive semidefinite Hermitian form over C" ((x, x) > 0 for every x). They 
are called positive semidefinite Hermitian matrices. One defines similarly, 
among the real symmetric matrices, those that are positive definite, respec- 
tively positive semidefinite. The positive definite real symmetric matrices 
form an open cone in Sym„(IR), denoted by SPD„. 

The natural ordering on Hermitian forms induces an ordering on Hermi- 
tian matrices. One writes H > On when the Hermitian form associated to El 
takes nonnegative values. More generally, one writes H > h if El — h > On- 
We likewise define an ordering on real-valued symmetric matrices, referring 
to the ordering on real-valued quadratic forms. ^ 

If U is unitary, the matrix U*MU is similar to M. If M is Hermitian, 
skew-Hermitian, normal, or unitary and if U is unitary, then El* MEl is still 
Hermitian, skew-Hermitian, normal, or unitary. 



^We warn the reader that another, completely different, order still denoted by the 
symbol > will be defined in Chapter 5. This one will concern real-valued matrices that 
are neither symmetric nor even square. One expects that the context is never ambiguous. 
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3.1 Eigenvalues of Real- and Complex- Valued 
Matrices 

Since C is algebraically closed, every complex-valued square matrix, and 
every endomorphism of a £?-vector space of dimension n > 1, possesses 
eigenvalues. As a matter of fact, the characteristic polynomial has roots. 
A real-valued square matrix may not have eigenvalues in M, but it has at 
least one in C. If n is odd, M G M„(M) has at least a real eigenvalue, 
because Pm is real of odd degree. 

Proposition 3.1.1 The eigenvalues of Hermitian matrices, as well as 
those of real symmetric matrices, are real. 

Proof 

Let M G M„(C') be a Hermitian matrix and let A be one of its eigen- 
values. Let us choose an eigenvector X: MX = XX. Taking the Hermitian 
adjoint, we obtain X*M = XX. Hence, 

AA*A = X*{MX) = {X*M)X = XX*X, 



or 

(A-A)X*A = 0. 

However X*X = \xj\'^ > 0. Therefore, we are left with A — A = 0. Hence 

A is real. 

■ 

We leave it to the reader to show, as an exercise, that the eigenvalues of 
skew-Hermitian matrices are purely imaginary. 

Proposition 3.1.2 The eigenvalues of the unitary matrices, as well as 
those of real orthogonal matrices, are complex numbers of modulus one. 

Proof 

As before, if X is an eigenvector associated to A, one has 
|AC|Af = (AA)*(AX) = {MXfMX = X*M*MX = X*X = ||Af , 
and therefore |Ap = 1. 



3.1.1 Continuity of Eigenvalues 

One of the more delicate statements in the elementary theory of matrices 
concerns the continuity of the eigenvalues. Though a proof might be pro- 
vided througth explicit bounds, it is easier to use Rouche’s theorem about 
holomorphic functions. We begin with a statement concerning polynomials, 
that is a bit less precise than Rouche’s theorem. 
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Theorem 3.1.1 Let n G IN and let P e C[X] he a polynomial of degree 
n, 

P{X) = Po + PlX + • • • + PnX~^ . 

Let X he a root of P, with multiplicity p,, and let d he the distance from x to 
the other roots of P. Let D he an open disk, D = D{x;p), with 0 < p < d. 
Then there exists a number e > 0 such that if Q G G[X] has degree n, 

Q{X) = (?o + (pX + • • • + 



and if 



max \qj — Pj\ < e, 

3 

then D contains exactly p roots of Q, counting multiplicities. 



Let us apply this result to the characteristic polynomial of a given matrix. 
Since the coefficients of the characteristic polynomial pm are polynomial 
functions of the entries of M, the map M pM is continuous from M„(C') 
to the set of polynomials of degree n. From Rouche’s theorem, we have the 
following result. 

Theorem 3.1.2 Let M G M„(C'), and let A he one of its eigenvalues, with 
multiplicity p, and let d be the distance from A to the other eigenvalues of 
M . Let D be an open disk, D = D{\; p), with 0 < p < d. Let us fix a norm 
on M„(C'). 

There exists an e > 0 such that if A G M„(C) and ||T|| < e, the sum of 
algebraic multiplicities of the eigenvalues of M + A in D equals p. 



Let us remark that this statement becomes false if one considers the 
geometric multiplicities. 

One often invokes this theorem by saying that the eigenvalues of a ma- 
trix are continuous functions of its entries. Here is an interpretation. One 
adapts the Hausdorff distance between compact sets so as to take into ac- 
count the multiplicity of the eigenvalues, li M,N G M„(C), let us denote 
by (Ai, . . . , A„) and {9i , . . . , 6*„) their eigenvalues, repeated according to 
their multiplicities. One then defines 

d(SpM, SpfV) := inf max |Aj — dg-C) I) 

<y£Sn t 

where Sn is the group of permutations of the indices {1, . . . , n}. This num- 
ber is called the distance between the spectra of M and N. With this 
notation, one may rewrite Theorem 3.1.2 in the following form. 

Proposition 3.1.3 Lf M G M„(C') and a > 0, there exists e > 0 such 
that ||A^ — M|| < e implies d{SpM,SpN) < a. 

A useful consequence of Theorem 3.1.2 is the following. 
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Corollary 3.1.1 In M„(A:) (k = Wl or C) the set of diagonalizable 
matrices is an open subset. 



3.1.2 Trigonalization in an Orthonormal Basis 

From now on we say that two matrices are unitarily similar if they are 
similar through a unitary transformation. Two real matrices are unitarily 
similar if they are similar through an orthogonal transformation. 

If K = C, one may sharpen Theorem 2.7.1: 

Theorem 3.1.3 (Schur) If M G M„(C), there exists a unitary matrix 
U such that U*MU is upper triangular. 

One also says that every matrix with complex entries is unitarily 
trigonalizable. 

Proof 

We proceed by induction on the size n of the matrices. The statement is 
trivial if n = 1. Let us assume that it is true in M„_i(C), with n > 2. Let 
M G M„(C') be a matrix. Since C is algebraically closed, M has at least 
one eigenvalue A. Let X be an eigenvector associated to A. By dividing X 
by ||X||, one can assume that X is a unit vector. One can then find an 
orthonormal basis . . . ,X"} of C" whose first element is X. Let 

us consider the matrix V := {X^ = X, . . . , X"), which is unitary, and 
let us form the matrix M' := V* MV. Since 



VM'e^ = MVe^ = MX = AX = ACe\ 



one obtains M'e^ = Ae^. In other words, M' has the block-triangular form: 

A 



M' = 



On-l N 



where N G M„_i(Cf). Applying the induction hypothesis, there exists 
W G U„_i such that W*NW is upper triangular. Let us denote by W 
the (block-diagonal) matrix diag(l, IT) S U„. Then W*M'W is upper 
triangular. Hence, U = VW satisfies the conditions of the theorem. 



3.2 Spectral Decomposition of Normal Matrices 

We recall that a matrix M is normal if M* commutes with M. For real 
matrices, this amounts to saying that M'^ commutes with M. Since it is 
equivalent for a Hermitian matrix H to be zero or to satisfy x* Hx = 0 for 
every vector x, we see that M is normal if and only if ||Aa ;||2 = ||A*a ;||2 
for every vector, where ||x ||2 denotes the standard Hermitian (Euclidean) 
norm (take H = AA* — A* A). 
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Theorem 3.2.1 If K = C , the normal matrices are diagonalizahle, using 
unitary matrices: 

{M*M = MM*) {3U G U„; M = U~^ diag(di, . . . , dn)U). 

Again, one says that normal matrices are unitarily diagonalizahle. This 
theorem contains the following properties. 

Corollary 3.2.1 Unitary, Hermitian, and skew-Hermitian matrices are 
unitarily diagonalizahle. 

Observe that among normal matrices one distinguishes each of the above 
families by the nature of their eigenvalues. Those of unitary matrices have 
modulus one, while those of Hermitian matrices are real. Finally, those of 
skew-Hermitian matrices are purely imaginary. 

Proof 

We proceed by induction on the size n of the matrix M . If n = 0, there 
is nothing to prove. Otherwise, if n > 1, there exists an eigenpair (A,x): 

Mx = Xx, ||a ;||2 = 1. 

Since M is normal, M—XIn is, too. From above, we see that ||(M* — A)x ||2 = 
II (M — A)x ||2 = 0, and hence M*x = Xx. Let H be a unitary matrix such 
that Vef = X. Then the matrix M\ := V* MV is normal and satisfies 
Mie^ = Ae^. Hence it satisfies M^e^ = Ae^. This amounts to saying that 
Ml is block-diagonal, of the form Mi = diag(A, M'). Obviously, M' inherits 
the normality of Mi. From the induction hypothesis, M', and therefore Mi 
and M , are unitarily diagonalizahle. 

■ 

One observes that the same matrix U diagonalizes M*, because M = 
U~^DU implies M* = U*D*U~^* = U~^D*U, since U is unitary. 

Let us consider the case of a positive semidefinite Hermitian matrix H . If 
HX = XX, then 0 < X*HX = A||A|p. The eigenvalues are thus nonnega- 
tive. Let Ai, . . . , Xp be the nonzero eigenvalues of H. Then H is unitarily 
similar to 

D := diag(Ai, ... , Ap,0, . . . ,0). 

From this, we conclude that rkH = p. Let U G U„ be such that H = 
UDU*. Defining the vectors Xa = V^Ua, where the Ua are the columns 
of U, we obtain the following statement. 

Proposition 3.2.1 Let H G M„(C) he a positive semidefinite Hermitian 
matrix. Let p he its rank. Then H has p real, positive eigenvalues, while the 
eigenvalue A = 0 has multiplicity n — p. There exist p column vectors Xa, 
pairwise orthogonal, such that 

H = XiXi -h • • • -k XpX;. 

Finally, H is positive definite if and only if p = n (in which case, X = 0 is 
not an eigenvalue). 
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3.3 Normal and Symmetric Real- Valued Matrices 



The situation is a bit more involved if M, a normal matrix, has real en- 
tries. Of course, one can consider M as a matrix with complex entries and 
diagonalize it in an orthonormal basis, but we quit in general the field of 
real numbers when doing so. We prefer to allow bases consisting of only 
real vectors. Since some of the eigenvalues might be nonreal, one cannot in 
general diagonalize M. The statement is thus the following. 



Theorem 3.3.1 Let M € M„(iR) be a normal matrix. There exists an or- 
thogonal matrix O such that OMO~^ be block- diagonal, the diagonal blocks 
being 1x1 (those corresponding to the real eigenvalues of M) or 2x2, the 
latter being matrices of direct similitude:^ 



{-b !) 



Similarly, is block-diagonal, the diagonal blocks being eigen- 

values or matrices of direct similitude. 

Proof 

One again proceeds by induction on n. When n > 1, the proof is the same 
as in the previous section whenever M has at least one real eigenvalue. 

If this is not the case, then n is even. Let us first consider the case n = 2. 
Then 




Since M is normal, we have b^ = and (a — d)(b — c) = 0. However, 
b c, since otherwise M would be symmetric, hence would have two real 
eigenvalues. Hence b = — c and a = d. 

Now let us consider the general case, with n > 4. We know that M has 
an eigenpair (A, z), where A is not real. If the real and imaginary parts of 
2 were colinear, M would have a real eigenvector, hence a real eigenvalue, 
a contradiction. In other words, the real and imaginary parts of z span a 
plane P in K". As before, Mz = Xz implies = Az. Hence we have 

MP C P and M"^P C P. Now let V be an orthogonal matrix that maps 
the plane Pq := onto P. Then the matrix Mi := V'^ MV is 

normal and satisfies 



MiPoCPo, MfPoCPo. 

This means that M\ is block-diagonal. Of course, each diagonal block (of 
sizes 2x2 and (n — 2) x (n — 2)) inherits the normality of Mi. Applying the 
induction hypothesis, we know that these blocks are unitarily similar to a 



similitude is an endomorphism of a Euclidean space that preserves angles. It splits 
as aR, where R is orthogonal and a is a scalar. It is direct if its determinant is positive. 
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block-diagonal matrix whose diagonal blocks are direct similitudes. Hence 
Ml and M are unitarily similar to such a matrix. 

■ 

Corollary 3.3.1 Real symmetric matrices are diagonalizahle over M, 
through orthogonal conjugation. In other words, given M G Sym^(JR), 
there exists an O G 0„(JR) such that OMO~^ is diagonal. 

In fact, since the eigenvalues of M are real, OMO~^ has only 1x1 blocks. 
We say that real symmetric matrices are orthogonally diagonalizahle. 

The interpretation of this statement in terms of quadratic forms is the 
following. For every quadratic form Q on IR", there exists an orthonor- 
mal basis {ei, . . . , e„} in which this form can be written with at most n 
squares:"* 

n 

Q(x^ ^ ^ a^x^ . 

i=l 

Replacing the basis vector ej by one sees that there also exists 

an orthogonal basis in which the quadratic form can be written 

r s 

Q{x) = 'y ^ Xj — y ^ Xj_^_j., 

i=i j=i 

with r-|-s < n. This quadratic form is nondegenerate if and only if r-|-s = n. 
The pair (r, s) is unique and called the signature or the Sylvester index of 
the quadratic form. In such a basis, the matrix associated to Q is 




3.3.1 Rayleigh Quotients 

Let M be a real n x n symmetric matrix, and let Ai < • • • < A„ be its 
eigenvalues arranged in increasing order and counted with multiplicity. Let 

^In solid mechanics, when Q is the matrix of inertia, the vectors of this basis are 
along the inertia axes, and the aj , which then are positive, are the momenta of inertia. 
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us denote by = {vi, . . . ,u„} an orthonormal eigenbasis {Mvj = XjVj). 
If a: G K", let us denote by yi, . . . , the coordinates of x in the basis B. 
Finally, let us denote by || • H 2 the usual Euclidean norm on IR". Then 

x'^Mx = ^ = A„||x||2. 

j 3 

Since v'^Mvn = A„||u„|| 2 , we deduce the value of the largest eigenvalue of 
M: 



x'^Mx t 

A„ = max - 7 ; — 777 ^ = max Mx | ||a ;||2 = 1) . 



(3.1) 



x/o II 37 II 2 

Similarly, the smallest eigenvalue of a real symmetric matrix is given by 

x'^Mx 



Ai = min -r — = min{a;“^ Mx \ ||a ;||2 = !}• 
x^o ||x||2 



(3.2) 



For a Hermitian matrix, the formulas (3. 1,3.2) remain valid when we replace 
x^ by x* . 

We evaluate the other eigenvalues of M G Sym„(iR) in the following 
way. For every linear subspace F of IR^ of dimension k, let us define 

]\/f nr 

R(F) = max = max | x G F, ||x ||2 = l| . 

xeF\{o} \\x\\l '' > 

The intersection of F with the linear subspace spanned by {vk , . . . , u„} is 
of dimension greater than or equal to one. There exists, therefore, a nonzero 
vector X € F such that yi = ■ ■ ■ = yk-i = 0. One has then 



"Mx = ^jVj yj = •^'=11 

j=k j 



xwl- 



Hence, R{F) > Afc. Furthermore, if G is the space spanned by {ui, . . . , Vk}, 
one has R{G) = Xk- Thus, we have 

Afc = min{i?(F) | dimF = k}. 

Finally, we may state the following theorem. 



Theorem 3.3.2 Let M be an n x n real symmetric matrix and Ai, . . . , A„ 
its eigenvalues arranged in increasing order, counted with multiplicity. Then 

Mix 

Xk = min max -77 — 7 - 75 -. 
dim F=k xeF\{0} \\xW 2 

If M is complex Hermitian, one has similarly 

T* A/f T 

Xk = min max 2 ■ 

dimF=fc a;6F\{0} ||a;||2 

This formula generalizes (3.1, 3.2). 
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3.3.2 Applications 

Theorem 3.3.3 Let H G H„_i, x G and a € M be given. Let 

Ai < • • • < A„_i he the eigenvalues of H and pi < • • • < Pn those of the 
Hermitian matrix 



H' = 



X a J 



One has then pi < \\ < ■ • ■ < Pj < \j < Pj+i < • • • • 

Proof 

By Theorem 3.3.2, the inequality pj < Xj is obvious, because the infimum 
is taken over a smaller set. 

Conversely, let tt : cc (xi, . . . ,Xn-i)'^ be the projection from C" on 
If F is a linear subspace of C" of dimension j + 1, its image under 
7T contains a linear subspace G of dimension j (it will often be exactly of 
dimension j). By Theorem 3.3.2, applied to H, one therefore has 

RfF) > R{G) > \j. 

Taking the infimum, we obtain Pj+i > Xj. 



The previous theorem is optimal, in the following sense. 

Theorem 3.3.4 Let X\ < ■ • • < A„_i and pi < • • • < Pn be real numbers 
satisfying pi < Xi < ■ ■ ■ < pj < Xj < Pj+i < ■ • ■ ■ Then there exist a vector 
X G iR” and a G M such that the real symmetric matrix 




where A = diag(Ai, . . . , A„_i), has the eigenvalues pj. 

Proof 

Let us compute the characteristic polynomial of H from Schur’s 
complement formula^ (see Proposition 8.1.2): 

pAX) = (X-a-x^(XJ„_i-A)-ix)det(X/„_i-A) 

Let us assume for the moment that all the inequalities pj < Xj < Pj+i 
hold strictly. In particular, the Xj's are distinct. Let us consider the partial 
fraction decomposition of the rational function 

ri;(^ - pi) ^ V _ Cj 



^One may equally (exercise) compute it by induction on n. 
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One thus obtains 

a = -^Aj, 

I j 

a formula that could also have been found by comparing the traces of A 
and of H. The inequalities Aj_i < /ij < \j ensure that each Cj is positive, 
because 

Let us put, then, Xj = y/cj (or —Xj = yTlJ). We obtain, as announced, 

Pu{X) = 1[{X - ^M). 

I 

In the general case one may choose sequences and A^™^ that con- 
verge to the Pi’s and the Xj’s as m ^ -hoo and that satisfy the inequalities 
in the hypothesis strictly. The first part of the proof (case with strict in- 
equalities) provides matrices H^'^\ Since the spectral radius is a norm over 
Sym„(IR) (the spectral radius is defined in the next Chapter), the sequence 
(ij(™^)m6W is bounded. In other words, remains bounded. Let 

us extract a subsequence that converges to a pair (a,x) G M x The 

matrix H associated to (a, x) solves our problem, since the eigenvalues 
depend continuously on the entries of the matrix. 



Corollary 3.3.2 Let H G Sym„_i(iR) with eigenvalues Ai < • • • < A„_i. 
Let pi,... ,p„ be real numbers satisfying pi < Xi < • • • < pj < Xj < 
Pj+i < • • • • Then there exist a vector x G IR" and a G M such that the real 
symmetric matrix 



H' = 



X a J 



has the eigenvalues pj. 



The proof consists in diagonalizing H through an orthogonal conjugation, 
then applying the theorem, and finally performing the inverse conjugation. 



3.4 The Spectrum and the Diagonal of Hermitian 
Matrices 

Let us begin with an order relation between finite sequences of real num- 
bers. If a = (oi, . . . , an) is a sequence of n real numbers, and if 1 < I < n. 
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we denote by Sfc(a) the number 



min 




card J = k 



Definition 3.4.1 Let a = (oi,... ,a„) and b = ,b„) be two se- 

quences of n real numbers. One says that b majorizes a, and one writes 
a ^ b, if 



Sk{a) < Sk{b), VI < fc < n, s„(a) = s„(6). 

The functions Sk are symmetric: 

■ ■ ■ 5 ^(7(n)) 

for every permutation a. One thus may always restrict attention to the 
case of nondecreasing sequences oi < • • • < a„. One has then Sk(a) = 
oi + • • • + Ofc. The relation a b for nondecreasing sequences, can now be 
written as 



oi + • • • + Ufc ^ + • • • + bk, k — 1, . . . ,n — 1, 

ai + • • • + = b\ bn- 

The latter equality plays a crucial role in the analysis below. The relation 
^ is a partial ordering. 

Proposition 3.4.1 Let x,y G FT. Then x < y if and only if for every 
real number t, 

n n 

'^\xj -t\>'^\yj -t\. (3.3) 

i=i i=i 

Proof 

We may assume that x and y are nondecreasing. If the inequality (3.3) 
holds, we write it first for t outside the interval / containing the xj’s and 
the 7/j’s. This gives Sn(x) = s„{y). Then we write it for t = Xk- Using 
Sn{x) = Sn{y), we obtain 

k n 

'^\xj-Xk\ = - Vj) + y^(Vj - Xk) + 2(sfc(y) - Sk{x)) 

j 1 fc+i 

- y2 ~^k\ + ‘^{sk{y) - sk{x)), 

3 

which with (3.3) gives Sfc(x) < Sk{y). 

Conversely, let us assume that x y. Let us define (f{f) := \xj — 
t\ — \yj — t\. This is a piecewise linear function, zero outside I. Its 
derivative, integer- valued, is piecewise constant. It increases at the points 
Xj's and decreases at the points yj's only. If mm{(j){t);t G M} < 0, this 
minimum will thus be reached at some Xk, with (f'(xk — 0) < 0 < (j)' (xk-\-0), 
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from which one obtains yk-i < < 2 /fc+i- Therefore, there are two cases, 

depending on the position of yx with respect to Xk- For example, if yu < Xk, 
we compute 

n k 

\xj - Xk\ = - Xk) + - Xj). 

3 fc+i 1 

From the assumption, it follows that 

n k 

^ \xj -Xk\> - Xk) + XI “ yj) = X 

3 fc+1 1 j¥^k 

which means that 4>{xk) > 0, which contradicts the hypothesis. Hence, 4> is 
a nonnegative function. 

■ 

Our first statement expresses an order between the diagonal and the 
spectrum of a Hermitian matrix. 

Theorem 3.4.1 (Schur) Let H be a Hermitian matrix with diagonal a 
and spectrum A. Then a >- X. 

Proof 

Let n be the size of H. We argue by induction on n. We may assume that 
a„ is the largest component of a. Since s„(A) = Tr A, one has s„(A) = s„(a). 
In particular, the theorem holds true for order 1. Let us assume that it holds 
for order n — 1. Let A be the matrix obtained from H by deleting the nth 
row and the nth column. Let /i = (/ii, . . . be the spectrum of A. 

Let us arrange A and y in increasing order. From Theorem 3.3.3, one has 
Ai < /ii < A 2 < • • • < Pn-i < A„. It follows that Sk{y) > Sk{X) for 
every k < n. The induction hypothesis tells us that Sk{y) < Sfc(a'), where 
a' = (oi, . . . , o„_i). Finally, we have Sk(a') = Sk(a), and Sfc(A) < Sfc(a) for 
every k < n, which ends the induction. 

■. 

Here is the converse. 

Theorem 3.4.2 Let a and A be two sequences of n real numbers such that 
a y X. Then there exists a real symmetric matrix of size n x n whose 
diagonal is a and spectrum is X. 

Proof 

We proceed by induction on n. The statement is trivial if n = 1. If n > 2, 
we use the following lemma, which will be proved afterwards. 

Lemma 3.4.1 Let n>2 and a, (3 two nondecreasing sequences of n real 
numbers, satisfying a p. Then there exists a sequence 7 0 / n — 1 real 
numbers such that 



ai < 71 < «2 < • • • < 7n-l < ctn 




54 



3. Matrices with Real or Complex Entries 



and j P' = {Pi, . . . ,Pn-i). 



We apply the lemma to the sequences a = \, P = a. Since j a' , the 

induction hypothesis tells us that there exists a real symmetric matrix S 
of size (n — 1) X (n — 1) with diagonal a' and spectrum 7. From Corollary 
3.3.2, there exist a vector y € and b G M such that the matrix 




has spectrum A. Since s„{a) = s„{X) =TrS = TrS'+6 = s„_i(a') + b, we 
have b = an- Hence, a is the diagonal of S. 



We prove now Lemma 3.4.1. Let A be the set of sequences 5 of n— 1 real 
numbers satisfying 



ai < Si < a2 <■■■ < Sn-i < an (3.4) 

together with 

k k 

Pj, Vfc<n-2. (3.5) 

i=i i=i 

We must show that there exists 5 S A such that s„_i(<5) = s„_i(/3'). Since 
A is convex and compact (it is closed and bounded in IR"), it is enough to 
show that 



inf s„_i(5) < Sn-i{P’) < sups„_i((5). (3.6) 

On the one hand, a' = {a\, . . . ,a„_i) belongs to A and s„_i(a') < 
Sn-i{P') from the hypothesis, which proves the first inequality in (3.6). 

Let us now choose a 5 that achieves the supremum of s„_i over A. Let r 
be the largest index less than or equal to n— 2 such that Sr(<5) = Sr{P'), with 
r = 0 if all the inequalities are strict. From Sj(<5) < Sj{P') for r < j < n—1, 
one has Sj = a^+i, since otherwise, there would exist e > 0 such that 
S := 6 + ee^ belong to A, and one would have s„_i(5) = s„_i(5) + e, 
contrary to the maximality of <5. Now let us compute 

Sn-l{S) — Sn-l{P') = Sr{P) ~ S„_i(/3) + «r+2 + ' ' ' + Q;„ 

= Sr(/3) - s„_i(/3) + s„(a) - Sr+i(a) 

> Sr{P) - Sn-l{P) + S„{P) - Sr+l{P) 

= Pn — Pr +1 > 0. 



This proves (3.6) and completes the proof of the lemma. 
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3.4-1 Hadamard’s Inequality 

Proposition 3.4.2 Let H G H„ be a positive semidefinite Hermitian 
matrix. Then 

n 

det tijj . 

i=i 

IfHG HPD„, the equality holds only if H is diagonal. 

Proof 

If det H = 0, there is nothing to prove, because the hjj are nonnegative 
(these are numbers {e^)*He^). Otherwise, H is positive definite and one 
has hjj > 0. We restrict attention to the case with a constant diagonal 
by letting D := diag(/i7/^^, . . . ^hfn'^) and writing {det H) / (]\.hjj) = 
det DHD = deti?', where the diagonal entries of H' equal one. There 
remains to prove that detiJ' < 1. However, the eigenvalues of 

H' are strictly positive, of sum n. Since the logarithm is concave, one has 

- log det iJ' = - V log Uj < log i V y-j = log 1 = 0, 
j 

which proves the inequality. Since the concavity is strict, the equality holds 
only if /ii = • • • = = 1, but then TT is similar, thus equal to In that 

case, H is diagonal. 

■ 

Applying proposition 3.4.2 to matrices of the form M*M or MM*, one 
obtains the following result. 

Theorem 3.4.3 For M G M„(C), one has 
/ \ 1/2 

n I n \ n / n 

|detM|<J]^ ^|myf , I det M| < I ^ Imi^f 

\j=i j j=i \i=i 

When M G GL„(C), the first (respectively the second) inequality is an 
equality only if the rows (respectively the columns) of M are pairwise 
orthogonal. 




3.5 Exercises 

1. Show that the eigenvalues of skew-Hermitian matrices, or as well 
those of real skew-symmetric matrices, are pure imaginary. 

2. Let P,Q G M„(iR) be given. Assume that P + iQ G GL„(C). Show 
that there exist a,b G M such that aP +bQ G GL„(IR) . Deduce that 
if M,N e M„(iR) are similar in M„(C), then these matrices are 
similar in M„(iR). 
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3. Show that a triangular and normal matrix is diagonal. Deduce that 
if U*TU is a unitary trigonalization of M, and if M is normal, then 
T is diagonal. 



4. For A G M„(fR), symmetric positive definite, show that 



max laid = maxoii 

i,j<n i<n 



5. Given an invertible matrix 

!;)eGL,(K), 



define a map Hm from 5”^ := C U {oo} into itself by 



hM{z) 



az + b 
cz + d 



(a) Show that Hm is a bijection. 

(b) Show that h : M Hm is a group homomorphism. Compute its 
kernel. 

(c) Let H be the upper half-plane, consisting on those z G G with 
'Az > 0. Compute AhM{z) in terms of Az and deduce that the 
subgroup 



GL+(iR) := {M e GL 2 (iR) | detM > 0} 



acts on H. 

(d) Conclude that the group PSL 2 (iR) := SL 2 (fR)/{±/ 2 }, called 
the modular group, acts on H. 

(e) Let M G SL 2 (M) be given. Determine, in terms of TrM, the 
number of fixed points of Hm on H. 



6. Show that the supremum of a family of convex functions on IR^ is 
convex. Deduce that the map M ^ \n (largest eigenvalue of M) 
defined on H„ is convex. 



7. Show that M G M„(C) is normal if and only if there exists a unitary 
matrix U such that M* = MU. 



8. Show that in M„(C) the set of diagonalizable matrices is dense. Hint: 
Use Theorem 3.1.3. 

9. Let (oi, . . . , a„) and (6i, . . . , 6„) be two sequences of real numbers. 
Find the supremum and the infimum of Tr(AB) as A (respectively B) 
runs over the Hermitian matrices with spectrum equal to (ai, . . . , a„) 
(respectively (6i,... ,6„)). 



10. (Kantorovich inequality) 
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(a) Let oi < • • • < a„ be a list of real numbers, with = oi > 0. 
Define 

n n 

l{u) ■='^ajUj, 

j=i j=i ^ 

Let Kn be the simplex of iR” defined by the constraints Uj > 0 
for every j = 1, . . . ,n, and uj = 1. Show that there exists 
an element v € Kn that maximizes I + L and minimizes \L — l\ 
on Kn simultaneously. 

(b) Deduce that 

max l(u)L(u) = 

(c) Let A e HPD„ and let oi, a„ be the smallest and largest 
eigenvalues of A. Show that for every x G C", 

(x*Ax)(x*A~^x) < ||a;||4_ 

4oia„ 

(Weyl’s inequalities) 

Let A, B be two Hermitian matrices of size n x n whose respective 
eigenvalues are ai < • • • < and (i\ < • ■ ■ < (3n- Define C = A + B 
and let 7 i < • • • < 7 „ be its eigenvalues. 

(a) Show that aj + Pi < 'jj < aj + Pn- 

(b) Let us recall that if R is a linear subspace of £?”, one writes 

Ra{F) = ma,x{x* Ax \ x G F, ||a ;||2 = 1}. 

Show that if G, H are two linear subspaces of C", then Rc{Gr\ 
H) < Ra{G) + Rb{H). 

(c) Deduce that if l,m> 1 and l + m= k + n (hence / + m > n + 1), 
then 

7fc < + Pm- 

(d) Similarly, show that I + m = k + f implies 

7fc > + Pm- 

(e) Conclude that the function A Afc(A) that associates to a Her- 
mitian matrix its kth eigenvalue (in increasing order) is Lipschitz 
with ratio 1 , meaning that 

\Xk{B) - Afc(H)| < \\B - H ||2 = p{B - A) 

(see the next chapter for the meaning of the norm ||M ||2 and for 
the spectral radius p{M)). 

Remark: The description of the set of the 3n-tuplets (a, P, 7 ) as A 
and B run over H„ is especially delicate. For a complete historical 
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account of this question, one may read the first section of Fulton’s 
and Bhatia’s articles [16, 6]. For another partial result, see Exercise 
19 of Chapter 5 (theorem of Lidskii). 

12. Let A be a Hermitian matrix of size n x n whose eigenvalues are 
<ai < • • • < On- Let i? be a Hermitian positive semidefinite matrix. 
Let 7 i < • • • < 7 n be the eigenvalues of A + B. Show that jk > C(k- 

13. Let M, N be two Hermitian matrices such that N and M — N are 
positive semidefinite. Show that det N < det M. 



14. Let A G Mp(C), C G Mq(C) be given with p,q> 1. Assume that 



M ■= 



A B \ 
B* C ) 



is Hermitian positive definite. Show that det M < (det A) (det C). Use 
the previous exercise and Proposition 8.1.2. 



15. For M G HPD„, we denote by Pk{M) the product of all the principal 
minors of order k of M . There are 




such minors. 

Applying Proposition 3.4.2 to the matrix M~^, show that 

< P„_i(M), 

and then in general that 

Pfc+i(M)'=<Pfc(M)"-'=. 

16. Let d : M„(1R) ^ be a multiplicative function; that is, 

d{MN) = d{M)d{N) 

for every M,N € M„(iR). If a € M, define 5(a) := d(a/„)^/". 

Assume that d is not constant. 

(a) Show that d(0„) = 0 and d{I„) = 1. Deduce that P G GL„(1R) 
implies d{P) yf 0 and d{P~^) = l/d{P). Show, finally, that if M 
and N are similar, then d{M) = d{N). 

(b) Let D G M„(1R) be diagonal. Find matrices Di, . . . , D„_i, sim- 
ilar to D, such that DDi---Dn-i = (detZl)/„. Deduce that 
d{D) = 5(det D). 

(c) Let M G M„(1R) be a diagonalizable matrix. Show that d{M) = 
5(det M). 

(d) Using the fact that is similar to M, show that d{M) = 
5(detM) for every M G M„(1R). 
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17. Let B e GL„(C). Verify that the inverse and the Hermitian adjoint 
of B~^B* are similar. Conversely, let A G GL„(C) be a matrix whose 
inverse and the Hermitian adjoint are similar: A* = PA~^P~^. 

(a) Show that there exists an invertible Hermitian matrix H such 
that H = A* HA. Look for an iL as a linear combination of P 
and of P*. 

(b) Show that there exists a matrix B e GL„(C) such that A = 
B~^B* . Look for a H of the form {al„ + bA*)H. 

18. Let A G M„(C) be given, and let Ai, . . . , A„ be its eigenvalues. Show, 
by induction on n, that A is normal if and only if 

n 

Ei«bP = Ei^'i'- 

i,3 1 

Hint: The left-hand side (whose square root is called Schur’s norm) 
is invariant under conjugation by a unitary matrix. It is then enough 
to restrict attention to the case of a triangular matrix. 

19. (a) Show that | det(/„ -I- H)| > 1 for every skew-Hermitian matrix 

A, and that equality holds only if H = 0„. 

(b) Deduce that for every M G M„(G) such that H := (M + M*)/2 
is positive definite, 

det H < I det M\ 

by showing that H~^{M — M*) is similar to a skew-Hermitian 
matrix. You may use the square root defined at Chapter 7. 

20. Describe every positive semidefinite matrix M G Sym^(M) such that 
mjj = 1 for every j and possessing the eigenvalue A = n (first show 
that M has rank one). 

21. li A,B G Mnycm{d), define the Hadamard product of A and B by 

Ao B := {Oijbij) i.Ci<n,l<j<7n • 

(a) Let A, B be two Hermitian matrices. Verify that A o B is 
Hermitian. 

(b) Assume that A and B are positive semidefinite, of respective 
ranks p and q. Using Proposition 3.2.1, show that there exist pq 
vectors Zap such that 

H o H = E 2a/34/3- 

a, (3 

Deduce that Ao B is positive semi-definite. 

(c) If A and B are positive definite, show that Ao B also is positive 
definite. 
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(d) Construct an example for which p,q < n, but Ao B is positive 
definite. 

22. (Fiedler and Ptak [13]) Given a matrix A G M„(1R), we wish to prove 
the equivalence of the following properties: 

PI For every vector x ^ 0 there exists an index k such that 
Xk{Ax)k > 0 . 

P2 For every vector x 0 there exists a diagonal matrix D with 
positive diagonal elements such that the scalar product {Ax, Dx) 
is positive. 

P3 For every vector a; yf 0 there exists a diagonal matrix D with 
nonnegative diagonal elements such that the scalar product 
{Ax,Dx) is positive. 

P4 The real eigenvalues of all principal submatrices of A are positive. 
P5 All principal minors of A are positive. 

We shall use the following notation: if a: G and if J is the index set 
of its nonzero components, then x^ denotes the vector in , and k 
the cardinality of J, where one retains only the nonzero components 
of X. To the set J one also associates the matrix A"^ , retaining only 
the indices in J. 

(a) Prove that Pj implies P(j+1) for every j = 1, . . . ,4. 

(b) Assume P5. Show that for every diagonal matrix D with 
nonnegative entries, one has det(A + D) > 0. 

(c) Then prove that P5 implies PI. 




4 

Norms 



4.1 A Brief Review 

In this Chapter, the field K will always be IR or C and E will denote itT". 

If A G M.n{K), the spectral radius of A, denoted by p{A), is defined as 
the largest modulus of the eigenvalues of A: 

p{A) = max{|A|; A e Sp(A)}. 

When K = JR, one takes into account the complex eigenvalues when 
computing p{A). 

The scalar (if K = IR) or Hermitian (if K = G) product on E is denoted 
by {x,y) := Xjijj. The vector space E is endowed with various norms, 
pairwise equivalent since E has finite dimension (Proposition 4.1.3 below). 
Among these, the most used norms are the P norms: 

lkllp= , llxlloo = max|xj|. 



Proposition 4.1.1 For 1 < p < oo, the map x ||a:||p is a norm on E. 
In particular, one has Minkowski’s inequality 



\\x + y\\p < ||a;||p+ ||j/||p. 



(4.1) 
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Furthermore, one has Holder’s inequality 

\{x,y)\<\\xmy\\p', 1 + ^ = ^- (4.2) 

The numbers p, p' are called conjugate exponents. 

Proof 

Everything except the Holder and Minkowski inequalities is obvious. 
When p = 1 or p = oo, these inequalities are trivial. We thus assume 
that 1 < p < oo. 

Let us begin with (4.2). If x or y is null, it is obvious. Indeed, one can 
even assume, by decreasing the value of n, that none of the Xj, yj’s is null. 
Likewise, since | (x, y) \ < \xj | |yj | , one can also assume that the Xj , yj are 

real and positive. Dividing by ||x||p and by ||y||p', one may restrict attention 
to the case where ||x||p = ||y||p' = 1. Hence, Xj,yj G (0, 1] for every j. Let 
us define 



aj=p\ogXj, bj = p' log yj . 
Since the exponential function is convex. 



that is. 



^ajjp+bj/p' < IgOj _|_ 

- p p' ' 



^ ^ P \ ^ p' 

XjPj < -X^. + —yf 

p j pi j 



Summing over j, we obtain 

1 



1 1 



(x,y)<-||x||^+-|K, = - + - = l, 
p ^ p' ^ P P 

which proves (4.2). 

We now turn to (4.1). First, we have 



\\x + y\\p = '^\xk + yk\^ ^'^{xkWxk + ykl^ ^ + '^\yk\\xk + yk\^ ^ 

k k k 

Let us apply Holder’s inequality to each of the two terms of the right-hand 
side. For example, 

'^\xk\\xk + yk\^~^ < Ikllp ('^\xk + yk\^^~^^^' 

k \ k 

which amounts to 




'^IxkWxk + ykl^ ^ < ||a;||p||x-ky||P \ 



Finally, 



lk + y||^<(||x||p + ||y||p)||x + y||r' 
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which gives (4.1). 



For p = 2, the norm || • ||2 is given by a Hermitian form and thus satisfies 
the Cauchy-Schwarz inequality: 



\{x,y)\ < llxllsllj/lb- 



This is a particular case of Holder’s inequality. 



Proposition 4.1.2 For conjugate exponents p,p' , one has 



||x||p = sup 



\\y\\p' ■ 



Proof 

The inequality > is a consequence of Holder’s. The reverse inequality is 
obtained by taking yj = Xj\xj\P~'^ if p < oo. If p = oo, choose yj = xj for 
an index j such that \xj\ = ||a:||oo- For k yf j, take yk = 0. 



Definition 4.1.1 Two norms N and N' on a (real or complex) vector 
space are said to he equivalent if there exist two numbers c, c' € M such 
that 

N < cN', N' < c'N. 

The equivalence between norms is obviously an equivalence relation, as 
its name implies. As announced above, we have the following result. 

Proposition 4.1.3 All norms on E = AT" are equivalent. For example, 

||a;||oo < ||a;||p < n^/P||x||oo- 

Proof 

It is sufficient to show that every norm is equivalent to || • ||i. 

Let TV be a norm on E. If x G E, the triangle inequality gives 

<^|x,|fV(e*), 

i 

where (e^,... ,e") is the canonical basis. One thus has iV < c|| • ||i for 
c := maxiiV(e*). Observe that this first inequality expresses the fact that 
N is Lipschitz (hence continuous) on the metric space X = {E, || • ||i). 

For the reverse inequality, we reduce ad absurdum: Let us assume that 
the supremum of ||a;||i/A^(x) is infinite for x yf 0. By homogeneity, there 
would then exist a sequence of vectors (x™)mew such that ||x'"||i = 1 and 
N(x^) —>■ 0 when m ^ +oo. Since the unit sphere of X is compact, one 
may assume (up to the extraction of a subsequence) that x™ converges to 
a vector x such that ||x||i = 1. In particular, x yf 0. Since N is continuous 
on X, one has also N{x) = limm^+oo N{x'^) = 0. Since is a norm, we 
deduce x = 0, a contradiction. 
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4- .1.1 Duality 



Definition 4.1.2 Given a norm || • || on FD , its dual norm on SD is 
defined by 



T 

I 11/ y ^ 

|a^ll :=suP|j-|]-- 

y#0 II 2 /II 



The fact that || • ||' is a norm is obvious. The dual of a norm on C" is 
defined in a similar way, with ifty*x instead of y"^x. For every x,y G C”, 
one has 



112/11'- (4.3) 

Proposition 4.1.2 shows that the dual norm of || • ||p is || • |jg for 1/p+l/q = 1. 
This suggests the following property. 

Proposition 4.1.4 The bidual (dual of the dual norm) of a norm is this 
norm itself: 

(II -11')' =11- II- 

Proof 

From (4.3), one has (|| • ||')' < || • ||. The converse is a consequence of 
the Hahn-Banach theorem: the unit ball i? of |j • || is convex and compact. 
If X is a point of its boundary (that is, ||x|| = 1), there exists an M- 
affine (that is, of the form constant plus IR-linear) function that is zero 
at x and nonpositive on B. Such a function can be written in the form 
2 ; iRz*y + c, where c is a constant, necessarily equal to —iRz*x. Without 
loss of generality, one may assume that z*x is real. Hence 

||y||' = sup 1Ry*z = y*x. 

Ildl=i 

One deduces 

(11*11')' > 0 = ‘ 

By homogeneity, this is true for every x G C”. 



4 . 1.2 Matrix Norms 

Let us recall that M„(iF) can be identified with the set of endomorphisms 
oiE = by 

{x ^ Ax). 

Definition 4.1.3 If |j • || zs o norm on E and if A G M„(iF), we define 
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Equivalently, 

II All = sup IIAxll = max ||Aa;||. 

||a;||<l lkll<l 

One verifies easily that A || A|| is a norm on M„(AT). It is called the norm 
induced by that of E, or the norm subordinated to that of E. Though we 
adopted the same notation || • || for the two norms, that on E and that on 
M„(itr), these are, of course, distinct objects. In many places, one finds the 
notation ||| • ||| for the induced norm. When one does not wish to mention 
from which norm on if a given norm on M„(itr) is induced, one says that 
Ai->- ||A|| is a matrix norm. The main properties of matrix norms are 

\\AB\\<\\A\\\\B\\, ||/„|| = 1. 

These properties are those of any algebra norm (otherwise called norm of 
algebra, see Section 4.4). In particular, one has ||A^|| < ||A||^ for every 
fc G W. 

Here are a few examples induced by the norms E: 

Pill = maxV|aij|, 

l<j<n 

i—1 

j^n 

PIloo = rn^V|a*j|, 

l<^<n 

|| A ||2 = 



To prove these formulas, we begin by proving the inequalities >, selecting 
a suitable vector x, and writing || A||p > || Ax||p/||x||p. For p = 1 we choose 
an index j such that the maximum in the above formula is achieved. Then 
we let Xj = 1, while Xk = 0 otherwise. For p = oo, we let xj = Oioj/loigjj, 
where io achieves the maximum in the above formula; For p = 2 we choose 
an eigenvector of A*A associated to an eigenvalue of maximal modulus. 
We thus obtain three inequalities. The reverse inequalities are direct con- 
sequences of the definitions. The values of P||i and P||oo illustrate a 
particular case of the general formula 



||A*|r=||A||=supsup 

x^O y^O 



^(y*Ax) 

IkHlyir' 



Proposition 4.1.5 For an induced norm, the condition ||H|| < 1 implies 
that In — B is invertible, with inverse given by the sum of the series 



Y^b\ 



k=0 



Proof 

The series '^^B^ is normally convergent, since ||i?''|| < EJPf, 
where the latter series converges because |ji?|| < 1. Since M„(AT) is com- 
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plete, the series converges. Furthermore, (/„ — B) = 

In — B^^^, which tends to The sum of the series is thus the inverse of 
In — B. One has, moreover. 



k 



1 



One can also deduce Proposition 4.1.5 from the following statement. 
Proposition 4.1.6 For every induced norm, one has 

P{A)<\\A\\. 

Proof 

The case K = G is easy, because there exists an eigenvector X G E 
associated to an eigenvalue of modulus p{A): 

p(A)||X|| = ||AX|| = ||AX||<P|| ||X||. 

li K = M, one needs a more involved trick. 

Let us choose a norm on C" and let us denote by N the induced norm 
on M„(C'). We still denote by N its restriction to M„(iR); it is a norm. 
Since this space has finite dimension, any two norms are equivalent: There 
exists C > 0 such that N{B) < C'||i3|| for every B in M„(iR). Using the 
result already proved in the complex case, one has for every m G IN that 

< fV(A™) < C\\A^\\ < C\\A\r. 

Taking the mth root and letting m tend to infinity, and noticing that (7^/™ 
tends to 1, one obtains the announced inequality. ■ 

In general, the equality does not hold. For example, if A is nilpotent 
though nonzero, one has p{A) = 0 < ||7l|| for every matrix norm. 

Proposition 4.1.7 Let || • || be a norm on and P G GL„(iF). Hence, 
N{x) := ||Px|| defines a norm on IF". Denoting still by || • || and N the 
induced norms on iF", one has N{A) = ||PAP“^||. 



Proof 

Using the change of dummy variable y = Px, we have 



N{A) = sup 



\\PAx\\ 



#0 \\Px\\ 



\\PAP-^y\\ 
SW- li-n 

vGo Ill/ll 



\\PAP-^. 



4.2 Householder’s Theorem 

Householder’s theorem is a kind of converse of the inequality p{B) < \\B\ 
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Theorem 4.2.1 For every B e M„(C) and all e > 0, there exists a norm 
on C” such that for the induced norm 

\\B\\<p{B) + e. 

In other words, p{B) is the infimum of ||i?||, as || • || ranges over the set 
of matrix norms. 

Proof 

From Theorem 2.7.1 there exists P € GL„(C') such that T := PBP~^ 
is upper triangular. From Proposition 4.1.7, one has 

inf |1B|| = inf ||PBP~i|| = inf ||T||, 

where the infimum is taken over the set of induced norms. Since B and 
T have the same spectra, hence the same spectral radius, it is enough to 
prove the theorem for upper triangular matrices. 

For such a matrix T, Proposition 4.1.7 still gives 

inf ||T|| < inf{||QTg-i 2 ; Q G GL„(C)}. 

Let us now take Q{p) = diag(l, p,pf,... , The matrix Q{p)TQ{p)~^ 

is upper triangular, with the same diagonal as that of T . Indeed, the entry 
with indices (t, j) becomes Hence, 

lim Q{p)TQ{p)~^ 

fl — >oo 

is simply the matrix D = diag(tn, . . . ,tnn)- Since || • H 2 is continuous (as 
is every norm), one deduces 

inf||T||< lim \\Q{p)TQ{p )~^\\2 = \\D \\2 = ^/p{D*D) = ma.x\tjj\ = p{T). 

fl — *00 

■ 

Remark: The theorem tells us that p{A) = A(7l), where 

A(A) :=inf||H||, 

the infimum being taken over the set of matrix norms. The first part of the 
proof tells us that p and A coincide on the set of diagonalizable matrices, 
which is a dense subset of M„(C'). But this is insufficient to conclude, 
since A is a priori only upper semicontinuous, as the infimum of continuous 
functions. The continuity of A is actually a consequence of the theorem. 



4.3 An Interpolation Inequality 

Theorem 4.3.1 (case K = C) Let || • ||p he the norm on M„(C') induced 
by the norm P on C” . The function 

1/p ^ logPllp, 

[ 0 , 1 ] -> m, 
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is convex. In other words, ifljr = 0/p+ (1 — 0)/q with 9 € (0, 1), then 

\\A\\r<\\A\\l\\A\\l-<^. 

Remark: 



1. The proof uses the fact that K = G. However, the norms induced 
by the || • ||p’s on M„(fR) and M„(C) take the same values on real 
matrices, even though their definitions are different (see Exercise 6). 
The statement is thus still true in M„(fR). 



2. The case {p, q, r) = (1, oo, 2) admits a direct proof. See the exercises. 

3. The result still holds true in infinite dimension, at the expense of 
some functional analysis. One even can take different norms at 
the source and target spaces. Here is an example: 

Theorem 4.3.2 (Riesz— Thorin) Let LI be an open set in and 

uj an open set in 1R‘^. Let po,pi,qo,qi be four numbers in [l,+oo]. 
Let 0 € [0,1] and p, q be defined by 

1 _l-9 0 1 _l-9 9 

p po Pi q <?o qi 

Consider a linear operator T defined on L^° D LP'^ (LI) , taking values in 
Assume thatT can be extended as a continuous operator 
from LPi{Lt) to L‘^^{co), with norm Mj, j = 1,2 .• 



\\Tfh 

Mj := sup ■ 



/#o 



WPj 



Then T can be extended as a continuous operator from LP{LI) to 
L‘^{u>), and its norm is bounded above by 



4. A fundamental application is the continuity of the Fourier transform 
from LP{R‘^) into its dual L^ (1R‘^) when 1 < p < 2. We have only 
to observe that (po,Pi,qo,qi) = (l;2,+oo,2) is suitable. It can be 
proved by inspection that every pair (p, q) such that the Fourier trans- 
form is continuous from LP{1R‘^) into L‘>{1R‘^) has the form (p,p') with 

1 < p < 2. 



5. One has analogous results for Fourier series. There lies the origin of 
Riesz-Thorin theorem. 



Proof (due to F. Riesz) 

Let us fix x and y in AT". We have to bound 






^ ^ ^jk^jVk ■ 
j.k 
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Let B be the strip in the complex plane defined by G [0, 1]. Given z G B, 
define (conjugate) exponents r{z) and r'{z) by 

1 z — z 1 z — z 

r(z) p q ’ r'{z) p' q' 

Set 

Xj(z) := = Xj exp log \xj\^ , 

Y,(z) := 

We then have 

Next, define a holomorphic map in the strip B by f{z) := {AX [z) ,Y [z)) . 
It is bounded, because the numbers Xj{z) and Yk{z) are. For example, 

lies between \xjY/^ and 

Let us set M{9) = sup{|/( 2 ;)|; 3?z = 9}. Hadamard’s three lines lemma 
(see [29], Chapter 12, exercise 8) expresses that 

9 ^ log M{9) 

is convex on (0,1). However, r(0) = q, r(l) = p, r'(0) = q' , r'(l) = p' , 
r{9) = r, r'{9) = r', X{9) = x, and Y{9) = y. Hence 

\{Ax,y)\ = \f{9)\ < M{9) < M{lfM{Qf~^. 

Now we have 

M(l) = sup{|/(z)|;3?z=l} 

< SUp{||HX(2:)||^(l)||y(z)||r(l)';3?Z = 1} 

= sup{|jHA:(z)||p||r(2;)||p.;3?z = 1} 

< ||H||pSup{||A:(z)||p||y(z)||p/;3?z = 1} 

Likewise, M(0) < ||4l||q||a;||r^‘^||y||([,^® . Hence 
\{Ax,y)\ < \\A\\l\\A\\\-^\\x\\l^^/^+^^^^^ 

Finally, 

px||, = sup < ||H||^||H||i-^||x||., 

which proves the theorem. 
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4.4 A Lemma about Banach Algebras 



Definition 4.4.1 A normed algebra is a K -algebra endowed with a norm 
satisfying \\xy\\ < ||a;|| ||?/||. Such a norm is called an algebra norm. When 
a normed algebra is complete (which is always true in finite dimension), it 
is called a Banach algebra. 

Lemma 4.4.1 Let A be a normed algebra and let x & A. The sequence 
Um '■= converges to its infimum, denoted by r{x). Additionally, 

if K = C, and if A has a unit element and is complete, then l/r(x) is 
the radius of the largest open ball B{0; R) such that e — zx is invertible for 
every z G B{0;R). 

Of course, one may apply the lemma to ^ = M„(C) endowed with 
a matrix norm. One then has r(x) = p{x), because e — zx = I — zA is 
invertible, provided that z is not the inverse of an eigenvalue. In the case 
K = IR, one uses an auxiliary norm N that is the restriction to M„(iR) of 
an induced norm on M„(C'). Since || • || and N are equivalent, one simply 
writes 

p{A) = 

The latter sequence converges to p{A) from the lemma, which implies the 
convergence of the former. We thus have the following result. 

Proposition 4.4.1 If A G M„(iir), then 

p{A)= lim 

m—*oo 

for every matrix norm. 

Proof 

Convergence. The result is trivial if x™ = 0 for some exponent. In the 
opposite case, we use the following inequalities, which come directly 
from the definition: 



<|h^’|n|xni, Va,p,rew. 



We then define 

Vm = — l0g||x’”|| = log-ltm- 
m 

Let us fix an integer p and perform Euclidean division of m by p\ 
m = ap -\- r with 0 < r < p — 1. This yields 



Xap+r Si 



apvp + rvr 
ap-\- r 



As m, hence a, tends to infinity, the right-hand side converges, 
because rVr remains bounded: 



lim sup Um < Vp. 
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Since this holds true for every p, we conclude that 



limsup Vm < inf Vp < liminf Vp, 
which proves the convergence to the infimum. 



Characterization (complex case). If i? < l/r{x), the Taylor series 

^ z G C, 

m^IN 



converges in norm in the ball S(0; R). Its sum equals (e — zx)~^ (see 
the proof of Proposition 4.1.5). 

The domain of the map z ^ {e— zx)~^ is open, since if it contains a 
point zq, the previous paragraph shows that e — {z — zo){e — zqx)~^x 
is invertible for every 2 ; satisfying 

\z - zo\r ((e - zox)~^x) < 1. 

Denoting by the inverse, we see that Xz{e — zox)~^ is an inverse 
of e — zx. In particular, f : z (e — z)~^ is holomorphic. 

If / is defined on a ball B{0; s), Cauchy’s formula 



ml 



’<»> = 2S 



S(0;, 



fjz) 

^m+l 



dz 



shows that ||x’”|| = 0{s ’"). Hence, l/r{x) > s. 



Corollary 4.4.1 Let B G M„(itl) be given. Then B'^ m—^oo ^ gj^iy 

tfp{B) < 1. 

Indeed, p{B) > 1 implies |iH'"|| > p{B'^) > 1 for every m. Conversely, 
p{B) < 1 implies IIS'"!! < r’” for m large enough, where r is selected in 
(P(S),1). 

■ 

We observe that this result is also a consequence of Householder’s 
theorem. 



4.5 The Gershgorin Domain 



Let A G M„(C), and let A be an eigenvalue and x an associated eigenvector. 
Let i be an index such that \xi\ = ||x||oo- Then Xj yf 0 and 



\an - A| 



* jG* 



Proposition 4.5.1 (Gershgorin) The spectrum of A is included in the 
Gershgorin domain G{A), defined as the union of the Gershgorin disks 
Di '■= T)(aii;^j^^ l^ijl)- 
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This result can also be deduced from Proposition 4.1.5: Let us decompose 
A = D + C, where D is the diagonal part of If A ^ an for every i, then 
A/„ — A = {XIn — D){In — B) with B = (A/„ — D)~^C. Hence, if A is an 
eigenvalue, then either A is an an, or ||i?||oo > 1- 

One may improve this result by considering the connected components 
of Q. Let G be one of them. It is the union of the Dts that meet it. Let 
p be the number of such disks. One then has G = Ui^jDi where I has 
cardinality p. 

Theorem 4.5.1 There are exactly p eigenvalues of A in G, counted with 
their multiplicities. 



Proof 

For r S [0, 1], we define a matrix A(r) by the formula 



a^j{r) 



(^iii j — 

raij, j^i- 



It is clear that the Gershgorin domain Qr of A{r) is included in Q. We 
observe that H(l) = A, and that r i-n A(r) is continuous. Let us denote by 
m(r) the number of eigenvalues (counted with multiplicity) of A(r) that 
belong to G. 

Since G and G\G are compact, one can find a Jordan curve, oriented in 
the trigonometric sense, that separates G from Q\G. Let F be such a curve. 
Since Gr is included in G, the residue formula expresses m(r) in terms of 
the characteristic polynomial of A(r): 



m{r) 



1 

2m 



Pr{z) 



dz. 



Since Pr does not vanish on F and r Pr,P^ are continuous, we deduce 
that r m(r) is continuous. Since m(r) is an integer and [0, 1] is connected, 
m(r) remains constant. In particular, m(0) = m(l). 

Finally, m{0) is the number of entries ajj (eigenvalues of H(0)) that 
belong to G. But ajj is in G if and only if Dj C G. Hence m(0) = p, which 
implies m(l) = p, the desired result. 



An improvement of Gershgorin’s theorem concerns irreducible matrices. 

Proposition 4.5.2 Let A be an irreducible matrix. If an eigenvalue of A 
does not belong to the interior of any Gershgorin disk, then it belongs to 
all the circles S{an; WijD- 

Proof 

Let A be such an eigenvalue and x an associated eigenvector. By assump- 
tion, one has |A — au\ > every i. Let I be the set of indices 
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for which \xi \ = ||x||oo and let J be its complement. If z € /, then 






< |A - ai, 



oo — 






Cl'i'jX'j 



^Ei 



It follows that ~ kiDlayl ^ 0> where all the terms in the sum 

are nonnegative. Each term is thus zero, so that = 0 for j € J. Since A 
is irreducible, J is empty. One has thus \xj\ = ||a;||oo for every j, and the 
previous inequalities show that A belongs to every circle. 



Definition 4.5.1 A square matrix A S M„(C') is said to be 

1. diagonally dominant if 

\o^ii\ ^ ^ ^ 1 ^ Z ^ TZ, 

j¥=i 

2. strongly diagonally dominant if in addition at least one of these n 
inequalities is strict; 

3. strictly diagonally dominant if the inequality is strict for every index 

i. 



Corollary 4.5.1 Let Abe a square matrix. If A is strictly diagonally dom- 
inant, or if A is irreducible and strongly diagonally dominant, then A is 
invertible. 

In fact, either zero does not belong to the Gershgorin domain, or it is 
not interior to the disks. In the latter case, A is assumed to be irreducible, 
and there exists a disk Dj that does not contain zero. 



4.6 Exercises 

1. Under what conditions on the vectors a, b G C” does the matrix M 
defined by my = Oibj satisfy ||M||p = 1 for every p G [1, oo]? 

2. Under what conditions on x, y, and p does the equality in (4.2) or 
(4.1) hold? 

3. Show that 

lim ||x|L = ||x||oo, Vx e E. 

p — »-+oo 

4. A norm on AT" is a strictly convex norm if ||x|| = ||y|| = 1, x ^ y, and 
0 < 6* < 1 imply \\9x + (1 — 0)y\\ < 1. 

(a) Show that || • ||p is strictly convex for 1 < p < oo, but is not so 
for p = 1, oo. 
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(b) Deduce from Corollary 5.5.1 that the induced norm || • ||p is not 
strictly convex on M„(1R). 

5. Let TV be a norm on WC . 

(a) For X G C”, define 

Ni{x) := inf \ai\N{x^) 

where the infimum is taken over the set of decompositions x = 
with a; G C and x’’ G IR". Show that fVi is a norm on 
C" (as a C-vector space) whose restriction to iR” is N. Note: 
Ni is called the complexification of N. 

(b) Same question as above for N 2 , defined by 

N2{x) ■= ^ J [e"^x]de, 

where 



[x] := a/N(3?x)2 + N(9x)2 



(c) Show that N 2 < Ni. 

(d) If N{x) = ||x||i, show that Ni{x) = ||x||i. Considering then the 
vector 




show that N 2 Ni. 

6. (continuation of exercise 5) 

The norms N (on IR") and Ni (on C”) lead to induced norms on 
M„(iR) and M„(C'), respectively. Show that if M G M„(iR), then 
N{M) = Ni{M). Deduce that Theorem 4.3.1 holds true in M„(iR). 

7. Let II • II be an algebra norm on M„(iG) {K = M or C), that is, a 
norm satisfying \\AB\\ < ||A|| • ||i?||. Show that p{A) < ||A|| for every 
A&M.n{K). 

8. In M„(C), let D be a diagonalizable matrix and N a nilpotent matrix 
that commutes with D. Show that p{D) = p{D + N). 

9. Let B G M„(C') be given. Assume that there exists an induced norm 
such that ||i?|| = p{B). Let A be an eigenvalue of maximal modulus 
and X a corresponding eigenvector. Show that X does not belong to 
the range of B — A/„. Deduce that the Jordan block associated to A 
is diagonal (Jordan reduction is presented in Chapter 6). 

10. (continuation of exercise 9) 
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Conversely, show that if the Jordan blocks of B associated to the 
eigenvalues of maximal modulus of B are diagonal, then there exists 
a norm on C" such that, using the induced norm, p{B) = |ji?||. 

11. Here is another proof of Theorem 4.2.1. Let K = JR or C, A € 
M„(iL), and let iV be a norm on iL". If e > 0, we define for all 
xeK^ 

||x|| := J2{p{A) + e)-'^N{A’^x). 

kGJN 

(a) Show that this series is convergent (use Corollary 4.4.1). 

(b) Show that || • |j is a norm on iL". 

(c) Show that for the induced norm, ||H|| < p{A) + e. 

12. A matrix norm || • |j on M„(C) is said to be unitarily invariant if 
II [/AC 1 1 = II A|| for every A G M„(C) and all unitary matrices U, V. 

(a) Find, among the most classical norms, two examples of unitarily 
invariant norms. 

(b) Given a unitarily invariant norm, show that there exists a norm 
N on JR" such that 

||A||=/V(si(A),... ,s„(A)), 

where the Sj (A)’s, the eigenvalues of H in the polar decompo- 
sition A = QH (see Chapter 7 for this notion), are called the 
singular values of A. 



13. (R. Bhatia [5]) Suppose we are given a norm || • || on M„(C) that 
is unitarily invariant (see the previous exercise). If A G M„(C), we 
denote by D{A) the diagonal matrix obtained by keeping only the 
ajj and setting all the other entries to zero. If ct is a permutation, 
we denote by A'^ the matrix whose entry of index (j, k) equals ajk if 
k = u{j), and zero otherwise. For example, A*"^ = D{A), where id is 
the identity permutation. If r is an integer between 1 — n and n — 1 , 
we denote by Dr{A) the matrix whose entry of index (j, k) equals ajk 
a k — j = r, and zero otherwise. For example, Dq{A) = D{A). 



(a) Let w = exp(2/7r/n) and let U be the diagonal matrix whose 
diagonal entries are the roots of unity 1, w, . . . , Show that 



D{A) 



. j. 

~y"u*^Aw. 



Deduce that ||D(A)|| < ||A||. 

(b) Show that ||A'^|| < ||A|| for every a G 5„. Observe that ||P|| = 
||/„|| for every permutation matrix P. Show that ||M|| < ||/„|| 
for every bistochastic matrix M (see Section 5.5 for this notion). 
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(c) If 9 G M, let US denote by Ug the diagonal matrix, whose fcth 
diagonal term equals exp(ik9). Show that 

Dr{A) = — / e^^^UgAU;d9. 

47t Jo 

(d) Deduce that ||Dj.(A)|| < ||A||. 

(e) Let p be an integer between zero and n — 1 and r = 2p+ 1. Let 
us denote by Tr{A) the matrix whose entry of index (j, k) equals 
ttjk if \k — j\ < p, and zero otherwise. For example, T^i^A) is a 
tridiagonal matrix. Show that 



where 



1 

Tr{A) = —J dp{9)UgAU;d9, 



-p 



is the Dirichlet kernel. 

(f) Deduce that ||Tj.(A)|j < Lp||A||, where 

Lp = ^ \dp{9)\de 

is the Lehesgue constant (note: Lp = 47 r“^logp + 0(1)). 

(g) Let A(yl) be the upper triangular matrix whose entries above 
the diagonal coincide with those of A. Using the matrix 



B = 



0 A(yl)* 
A(A) 0 



show that ||A (^)||2 < L„||A ||2 (observe that ||i ?||2 = ||A(A)|| 2 ). 

(h) What inequality do we obtain for Aq(A), the strictly upper tri- 
angular matrix whose entries lying strictly above the diagonal 
coincide with those of A1 



14. We endow C" with the usual Hermitian structure, so that M„(C) is 
equipped with the norm ||A|| = p{A*A)^/‘^. 

Suppose we are given a sequence of matrices {Aj)j^^ in M„(C) and 
a summable sequence 7 € of positive real numbers. Assume, 

finally, that for every pair (j, k) £ Z x Z, 

||A*A,||<7(J-fc)^ ||A,A*,||<7(j-fc)^ 

(a) Let U be a finite subset of Z. Let Bp denote the sum of the 
Aj’s as j runs over F. Show that 

||(i?|^i?F)"’”||< card F hi!?’”, 

(b) Deduce that \\Bf\\ < hl!i- 



Vm G IN. 
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(c) Show {Collar’s lemma) that for every x,y G C", the series 

y'^ Y. 

is convergent, and that its sum y^Ax defines a matrix A G 
M„(C) that satisfies 

Hint: For a sequence (uj)j^z: of real numbers, the series Uj 
is absolutely convergent if and only if there exists M < +oo 
such that I — ^ every finite subset F. 

(d) Deduce that the series Aj converges in M„(C). May one 
conclude that it converges normally? 

15. Let II • II be an induced norm on M„(iR). We wish to characterize the 
matrices B G M„(iR) such that there exist cq > 0 and w > 0 with 

(0 < e < eo) {\\In - eB\\ < 1 - uie). 

(a) For the norm || • ||oo, it is equivalent that B be strictly diagonally 
dominant. 

(b) What is the characterization for the norm || • ||i? 

(c) For the norm || • || 2 , it is equivalent that B^ + B he positive 
definite. 

16. If A G M„(C) and j = 1,... ,n are given, we define rj(A) := 

\ajk\- For i yf j, define 

Bij{A) = {z G C; \{z- au){z - ajj)\ < ri{A)rj{A)}. 

These sets are Cassini ovals. Finally, let 

B(A) := Ui<ic^j<nBij{A) . 

(a) Show that SpA c B{A). 

(b) Show that this result is sharper than Proposition 4.5.1. 

(c) When n = 2, show that in fact Sp A is included in the boundary 
oiB{A). 

17. Let B G M„{C). 

(a) Returning to the proof of Theorem 4.2.1, show that for every 
e > 0 there exists on C" a Hermitian norm || • || such that for 
the induced norm ||R|| < p{B) + e. 

(b) Deduce that p{B) < 1 holds if and only if there exists a matrix 
A G HPD„ such that A — B*AB G HPD„. 

18. For A G M„(C), define 

e := max \aij\, S := min \au — Ojj \ . 
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We assume in this exercise that <5 > 0 and e < 6 /An. 

(a) Show that each Gershgorin disk Dj contains exactly one 
eigenvalue of A. 

(b) Let p > 0 be a real number. Show that A^, obtained by multi- 
plying the ith row oi A hy p and the ith column by 1 / p, has the 
same eigenvalues as A. 

(c) Choose p = 2e/5. Show that the ith Gershgorin disk of A^ con- 
tains exactly one eigenvalue. Deduce that the eigenvalues of A 
are simple and that 



d(Sp(A),diag(A)) < — 



where diag(A) = {an, . . . , a„„}. 



19. Let A G M„(C) be a diagonalizable matrix: 
A = S'diag(di, . . . 



Let II • II be an induced norm for which ||Z1|| = maxj \dj \ holds, where 
D := diag(di, . . . , d„). Show that for every E G M„(C) and for every 
eigenvalue \ oi A + E, there exists an index j such that 

\X-d,\<\\S\\-\\S-^-\\E\\. 

20. Let A G M„(iL), with K = Ft or C. Give another proof, using 
the Cauchy-Schwarz inequality, of the following particular case of 
Theorem 4.3.1: 

||A|U<P||}/^P||V2. 

21. Show that if A G M„(C) is normal, then p{A) = ||GI|| 2 . Deduce that 
if A and B are normal, p{AB) < p{A)p{B). 



22. Let iVi and N 2 be two norms on C”. Denote by and A 2 the 
induced norms on M„(C). Let us define 



R 



Ni{x) 

max , 

x^O N2 {x) 



S := max 



N2(x) 
x^h Ni{x) 



(a) Show that 



max 

A^o A// (A) 



RS = max 



M2{A) 



(b) Deduce that if N\ = N 2 , then 7V2/A^i is constant. 

(c) Show that if N\ < M 2 , then N 2 /N 1 is constant and therefore 
M2 = Ml ■ 



23. (continuation of exercise 22) 

Let II • II be an algebra norm on M„(C). If y G C" is nonzero, we 
define ||a;||j; := \\xy*\\- 
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(a) Show that || • ||y is a norm on C" for every y yf 0. 

(b) Let My be the norm induced by || • \\y. Show that Afy < || • ||. 

(c) We say that || • |j is minimal if there exists no other algebra norm 
less than or equal to || • ||. Show that the following assertions are 
equivalent: 

i. II • II is an induced norm on M„(C'). 

ii. II • II is a minimal norm on M„(C'). 

iii. For all j/ yf 0, one has 11-11= Ny. 

24. (continuation of exercise 23) 

Let II • II be an induced norm on M„(C). 

(a) Let y, z ^ 0 be two vectors in C". Show that (with the notation 
of the previous exercise) || • ||y/|| • ||z is constant. 

(b) Prove the equality 

\\xy*\\-\\zt*\\ = \\xt*\\-\\zy*\\. 

25. Let M G M„(C') and H G HPD„ be given. Show that 

\\HMH\\2 < ^\\H^M + MH^\\2. 

26. We endow IR^ with the Euclidean norm || • || 2 , and M. 2 {M) with the 
induced norm, denoted also by || • H 2 . We denote by S the unit sphere of 
M 2 (iR): M G S is equivalent to ||M ||2 = 1, that is, to p(M'^M) = 1. 
Similarly, B denotes the unit ball of M 2 (fR). 

Recall that if C is a convex set and ii P G C, then P is called an 
extremal point if P G [Q, P] and Q,R G C imply Q = R = P. 

(a) Show that the set of extremal points of B is equal to 02{M). 

(b) Show that M G S if and only if there exist two matrices P,Q G 
02{1R) and a number a G [0, 1] such that 

M = P(“ J)0, 

(c) We denote by TZ = S02{1R) the set of rotation matrices, and 
by S that of matrices of planar symmetry. Recall that 02{M) is 
the disjoint union of TZ and S. Show that S is the union of the 
segments [r, s] as r runs over TZ and s runs over S. 

(d) Show that two such “open” segments (r, s) and (P, s') are either 
disjoint or equal. 

(e) Let M,N G S. Show that ||M — fV ||2 = 2 (that is, (M,N) is a 
diameter of P) if and only if there exists a segment [r, s] (r G TZ 
and s G S) such that M G [r, s] and N G [— r, — s]. 




5 

Nonnegative Matrices 



In this chapter matrices have real entries in general. In a few specified cases, 
entries might be complex. 



5.1 Nonnegative Vectors and Matrices 

Definition 5.1.1 A vector x G IR" is nonnegative, and we write x > 0, 
if its coordinates are nonnegative. It is positive, and we write x > 0, if its 
coordinates are (strictly) positive. Furthermore, a matrix A G Mnxm{dhi) 
( not necessarily square ) is nonnegative ( respectively positive ) if its entries 
are nonnegative (respectively positive); we again write A > 0 (respectively 
A > Q). More generally, we define an order relationship x < y whose 
meaning is y — x >0. 

Definition 5.1.2 Given x G C", we let |x| denote the nonnegative vector 
whose coordinates are the numbers \xj\. Similarly, if A € M„(C'), the 
matrix |A| has entries \aij\. 

Observe that given a matrix and a vector (or two matrices), the triangle 
inequality implies 

|Ax| < 01 • |x|. 

Proposition 5.1.1 A matrix is nonnegative if and only if x > 0 implies 
Ax >0. It is positive if and only if x >0 and x ^ 0 imply Ax > 0. 



Proof 




5.2. The Perron-Frobenius Theorem: Weak Form 
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Let us assume that Ax > 0 (respectively > 0) for every x > 0 (respec- 
tively > 0 and yf 0). Then the ith column is nonnegative (respectively 
positive), since it is the image of the ith vector of the canonical basis. Hence 
H > 0 (respectively > 0). 

Conversely, A > 0 and a: > 0 imply trivially Ax >0. IfH>0,a;>0, 
and a; yf 0, there exists an index I such that a;; > 0. Then 

{Ax)i = ^ UijXj > auxi > 0, 

3 

and hence Ax > 0. 

■ 

An important point is the following: 

Proposition 5.1.2 If A G M„(fR) is nonnegative and irreducible, then 
(/-h A)”-i > 0. 

Proof 

Let a; be a nonnegative, nonzero vector and define a;™ = (/ -I- A)"^x, 
which is nonnegative. Let us denote by Pm the set of indices of the nonzero 
components of x™: Pq is nonempty. Since > x(", one has Pm C 

Pm+i- Let us assume that the cardinality \Pm\ of Pm is strictly less than 
n. There are thus one or more zero components, whose indices form a 
nonempty subset I, complement of Pm- Since A is irreducible, there exists 
some nonzero entry Oij, with i G I and j G Pm- Then > OijxJ^ > 0, 

which shows that Pm+i is not equal to Pm, and thus |Pm-i-i| > \Pm\- By 
induction, we deduce that \Pm\ > minjm -I- l,n}. Hence |Pn-i| = n. 



5.2 The Perron-Frobenius Theorem: Weak Form 

Theorem 5.2.1 Let A G M„(iR) he a nonnegative matrix. Then p{A) is 
an eigenvalue of A associated to a nonnegative eigenvector. 

Proof 

Let A be an eigenvalue of maximal modulus and v an eigenvector, 
normalized by ||w||i = 1. Then 

p{A)\v\ = |Av| = \Av\ < A\v\. 

Let us denote by C the subset of iR" (actually a subset of the unit simplex 
Kn) defined by the (in) equalities Xj = 1, x > 0, and Ax > p{A)x. This 

is a closed convex set, nonempty, since it contains |w|. Finally, it is bounded, 
because x G C implies 0 < x^ < 1 for every j; thus it is compact. Let us 
distinguish two cases: 

1. There exists x G C such that Ax = 0. Then p{A)x < 0 furnishes 
p{A) = 0. The theorem is thus proved in this case. 
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2. For every x in C, Ax ^ 0. Then let us define on C a continuous map 
/by 



/(a;) 



4 ^ 

Px|!i ■ 



It is clear that f{x) > 0 and that ||/(x)||i = 1. Finally, 



Af{x) 



1 



AAx > 



1 



Ap{A)x 



p{A)f{x), 



so that /(C) C C. Then Brouwer’s theorem (see [3], p. 217) asserts 
that a continuous function from a compact convex subset of 
into itself has a fixed point. Thus let y be a fixed point of /. It is 
a nonnegative eigenvector, associated to the eigenvalue r = ||^y||i. 
Since y G C, we have ry = Ay > p{A)y and thus r > p{A), which 
implies r = p{A). 



That proof can be adapted to the case where a real number r and a 
nonzero vector y are given satisfying y > 0 and Ay > ry. Just take for C 
the set of vectors x such that Xi = 1, x > 0, and Ax > rx. We then 
conclude that p{A) > r. 



5.3 The Perron-Frobenius Theorem: Strong Form 

Theorem 5.3.1 Let A € M„(IR) he a nonnegative irreducible matrix. 
Then p{A) is a simple eigenvalue of A, associated to a positive eigenvector. 
Moreover, p{A) > 0. 



5. 3. 1 Remarks 

1. Though the Perron-Frobenius theorem says that p{A) is a simple 
eigenvalue, it does not tell anything about the other eigenvalues 
of maximal modulus. The following example shows that such other 
eigenvalues may exist: 

0 1 
1 0 

The existence of several eigenvalues of maximal modulus will be 
studied in Section 5.4. 

2. One obtains another proof of the weak form of the Perron-Frobenius 
theorem by applying the strong form to T + aJ, where J > 0 and 
a > 0, then letting a tend to zero. 
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3. Without the irreducibility assumption, p{A) may be a multiple eigen- 
value, and a nonnegative eigenvector may not be positive. This holds 
for a matrix of size n = 2m that reads blockwise 



A = 



B Om 
Im. B 



Here, p{A) = p{B), and every eigenvalue has an even algebraic mul- 
tiplicity. Moreover, if p{B) is a simple eigenvalue of B, associated to 
the eigenvector Z > 0, then the kernel of H — p{A)In is spanned by 




which is not positive. 



Proof 

For r > 0, we denote by Cr the set of vectors of K” defined by the 
(in)equalities 

a; > 0, ||a:||i = 1, Ax > rx. 

Each Cr is a convex compact set. We saw in the previous section that if A 
is an eigenvalue associated to an eigenvector x of unit norm ||a;||i = 1, then 
|x| € C'lA]. In particular, Cpt^A) is nonempty. Conversely, if Cr is nonempty, 
then for x G Cr, 

r = r||:r||i<||Hx||i<P||i||x||i = ||H||i, 

and therefore r < ||Al||i. Furthermore, the map r Cr is nonincreasing 
with respect to inclusion, and is “left continuous” in the following sense. If 
r > 0, one has 



Let us then define 



Cr — Cs<rCs. 



R = sup{r I Cr yf 0}, 

so that R G [p{A), ||H|ji]. The monotonicity with respect to inclusion shows 
that r < R implies Cr yf 0. 

If a; > 0 and ||a;||i = I, then Ax > 0 and Ax yf 0, since A is nonnegative 
and irreducible. From Lemma 5.3.1 it follows that R > 0. The set Cr, being 
the intersection of a totally ordered family of nonempty compacts sets, is 
nonempty. 

Let X G Cr. Lemma 5.3.1 below shows that x is an eigenvector of A 
associated to the eigenvalue R. We observe that this eigenvalue is not less 
than p{A) and infer that p{A) = R. Hence p{A) is an eigenvalue associated 
to the eigenvector x, and p{A) > 0. Lemma 5.3.2 below ensures that x > 0. 

The proof of the simplicity of the eigenvalue p(A) will be given in Section 
5.3.3. 
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5.3.2 A Few Lemmas 

Lemma 5.3.1 Let r > 0 and a; > 0 such that Ax > rx and Ax A 
Then there exists r' > r such that Cr> is nonempty. 

Proof 

Let y := (/„ + A)^~^x. Since A is irreducible and x > 0 is nonzero, one 
has y > 0. Similarly, Ay — ry = {In + A)^~^{Ax — rx) > 0. Let us define 
r' := ToiTij{Ay)j /yj, which is strictly larger than r. We then have Ay > r'y, 
so that Cr' contains the vector j//||j/||i. 



Lemma 5.3.2 The nonnegative eigenvectors of A are positive. 



Proof 

Given such a vector x with Ax = Ax, we observe that A G SF . Then 



(1 + A) 



— {In+A)^ 



and the right-hand side is strictly positive, from Proposition 5.1.2. 



Finally, we can state the following result. 

Lemma 5.3.3 Let A,Bg M„(C') be matrices, with A irreducible and 
\B\ < A. Then p{B) < p{A). 

In case of equality (p{B) = p{A)), the following hold: 

• \B\ = A; 

• for every eigenvector x of B associated to an eigenvalue of modulus 
p{A), |x| is an eigenvector of A associated to p{A). 

Proof 

In order to establish the inequality, we proceed as above. If A is an 
eigenvalue of B, of modulus p{B), and if x is a normalized eigenvector, 
then p{B)\x\ < \B\ ■ \x\ < A\x\, so that C'p(s) is nonempty. Hence p{B) < 
R = p{A). 

Let us investigate the case of equality. If p{B) = p{A), then |x| G C'p(A)j 
and therefore |x| is an eigenvector: H|x| = p{A)\x\ = p{B)\x\ < \B\ ■ |x|. 
Hence, {A— |S|)|x| < 0. Since |x| > 0 (from Lemma 5.3.2) and A— \B\ > 0, 
this gives |i?| = A. 



5.3.3 The Eigenvalue p{A) Is Simple 

Let Pa{X) be the characteristic polynomial of A. It is given as the compo- 
sition of an n-linear form (the determinant) with polynomial vector-valued 
functions (the columns of X In — A). If (f> is p-linear and if V\{X ), ... ,Vp{X) 
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are polynomial vector-valued functions, then the polynomial P{X) := 
(j){Vi{X ), . . . , Vp{X)) has the derivative 

P'(X) = cj){Vl, V 2 ,..., Vp) + cj>{VuVi, ... ,Vp) + --- + cj>{Vu... , Vp.uV;). 
One therefore has 

P'a{X) = det(e\o 2 ,... ,a„)-|-det(ai,e^,... ,a„)H 

det(ai, . . . , a„_i, e"), 

where aj is the jth column of XIn — A and {e^, . . . ,e"} is the canonical 
basis of fR". Developping the jth determinant with respect to the jth 
column, one obtains 



n 

p:^{X)=J2Pa,{X), (5.1) 

i=i 

where Aj G M„_i(iR) is obtained from A by deleting the jth row and 
the jth column. Let us now denote by Bj G M„(iR) the matrix obtained 
from A by replacing the entries of the jth row and column by zeroes. This 
matrix is block-diagonal, the two diagonal blocks being Aj G M„_i(lR) and 
0 G Mi(M). Hence, the eigenvalues of Bj are those of Aj, together with 
zero, and therefore p{Bj) = p{Aj). Furthermore, \Bj\ < A, but \Bj\ yf A 
because A is irreducible and Bj is block-diagonal, hence reducible. It follows 
(Lemma 5.3.3) that p{Bj) < p{A). Hence Pa^{p{X)) is nonzero, with the 
same sign as Pa^ in a neighborhood of -l-oo, which is positive. Finally, 
P'a{p{A)) is positive and p(A) is a simple root. 

This completes the proof of Theorem 5.3.1. A different proof of the sim- 
plicity and another proof of the Perron-Frobenius theorem are given in 
Exercises 2 and 4. 



5.4 Cyclic Matrices 

The following statement completes Theorem 5.3.1. 

Theorem 5.4.1 Under the assumptions of Theorem 5.3.1, the set R{A) of 
eigenvalues of A of maximal modulus p{A) is of the form R{A) = p{A)Up, 
where Up is the group of pth roots of unity, where p is the cardinality of 
R{A). Every such eigenvalue is simple. The spectrum of A is invariant un- 
der multiplication by Up. Finally, A is similar, by means of a permutation 
of coordinates in IR^ , to the following cyclic form. In this cyclic matrix each 
element is a block, and the diagonal blocks (which all vanish) are square 
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with nonzero sizes: 




Remarks: 

• The converse is true. For example, the spectrum of a cyclic matrix is 
stable under multiplication by exp(2i7r/p). 

• One may show that p divides n — no, where no is the multiplicity of 
the zero eigenvalue. 

• The nonzero eigenvalues of A are the pth roots of those of the matrix 
Ml M 2 • • • Mp, which is square, though its factors might not be square. 

Proof 

Let us denote by X the unique nonnegative eigenvector of A normalized 
by 11^ 111 = 1. If T is a unitary eigenvector, associated to an eigenvalue p 
of maximal modulus p{A), the inequality p{A)\Y\ = \AY\ < A\Y\ im- 
plies (Lemma 5.3.3) |y| = X. Hence there is a diagonal matrix D = 
diag(e*“L . . . ,e*“”) such that Y = DX. Let us define a unimodular com- 
plex number e*'*' = p/ p{A) and let B be the matrix e~^^D~^AD. One has 
\B\ = A and BX = X. For every j, one therefore has 

n n 

^ ^ hjk^k — ^ ^ l^l'fcl^/i:- 
fc=l fc=l 

Since X > 0, one deduces that B is real-valued and nonnegative; that is, 
B = A. Hence D~^AD = e^^ A. The spectrum of A is thus invariant under 
multiplication by e*'*'. 

Let U = p{A)~^ R{A) , which is included in the unit circle. The previ- 
ous discussion shows that U is stable under multiplication. Since U is finite, 
it follows that its elements are roots of unity. Since the inverse of a dth root 
of unity is its own {d— l)th power, hi is stable under inversion. Hence it is 
a finite subgroup of that is, it is Up, for a suitable p. 

Let Pa be the characteristic polynomial and let u = exp(2i7r/p). One 
may apply the first part of the proof to ^ = ujp{A). One has thus D~^AD = 
loA, and it follows that Pa{X) = Pa{X f u:) . Therefore, multiplication by 

Lv sends eigenvalues to eigenvalues of the same multiplicities. In particular, 
the eigenvalues of maximal modulus are simple. 

Iterating the conjugation, one obtains D~pADP = A. Let us set 

DP = diag(di, . . . ,d„). 
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One has thus dj = dk, provided that ajk ^ 0. Since A is irreducible, one 
can link any two indices j and fc by a chain jo = j, . . . , jr = k such that 
o.js-i,js 7 ^ 0 for every s. It follows that dj = dk for every j, k. But since one 
may choose Y\ = that is, = 0, one also has di = 1 and hence = 
In- The aj are thus pth roots of unity. With a conjugation by a permutation 
matrix we may limit ourselves to the case where D has the block-diagonal 
form diag( Jo, w Ji, . . . ,w^“^Jp_i), where the Ji are identity matrices of 
respective sizes no, . . . , Up-i- Decomposing A into blocks Aim of sizes n; x 
Um, one obtains co^Ajk = uj^~^^Ajk directly from the conjugation identity. 
Hence Ajk = 0 except for the pairs (j, k) of the form (0, 1), (1, 2), . . . ,{p — 
2,p— 1), {p — 1, 0). This is the announced cyclic form. 



5.5 Stochastic Matrices 

Definition 5.5.1 A matrix M G M„(iR) is said to be stochastic if M >0 
and if for every z = 1, . . . , n, one has 

n 

^nzij = 1. 

One says that M is bistochastic (or doubly stochastic^ if both M and M'^ 
are stoehastic. 

Denoting by e G K" the vector all of whose coordinates equal one, one 
sees that M is stochastic if and only if M > 0 and Me = e. Moreover, M 
is bistochastic if M > 0, Me = e, and e^ M = e^ . If M is stochastic, one 
has ||Ma;||oo < lh||oo for every x G C", and therefore p{M) < I. But since 
Me = e, one has in fact p{M) = I. 

The stochastic matrices play an important role in the study of Markov 
chains. A special case of a bistochastic matrix is a permutation matrix P{a) 
(cr G Sn), whose entries are 

The following theorem explains the role of permutation matrices. 

Theorem 5.5.1 (Birkhoff) A matrix M G M„(IR) is bistochastic if and 
only if it is a center of mass (that is, a bary center with nonnegative weights) 
of permutation matrices. 

The fact that a center of mass of permutation matrices is a doubly stochas- 
tic matrix is obvious, since the set A„ of doubly stochastic matrices is 
convex. The interest of the theorem lies in the statement that if M G A„, 
there exist permutation matrices P\, . . . , Pr and positive real numbers 
ai, . . . ,ar with ai + ■ ■ ■ + ar = 1 such that M = a\Pi -I- • • • -I- arPr- 
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Let us recall that a point a of a convex set C is an extreme point if the 
equality x = 9y + {1 — 9) z, with y,z € C and 9 G (0, 1) implies y = z = x. 
The Krein-Milman theorem (see [30], Theorem 3.23) says that a convex 
compact subset of WC is the convex hull, that is, the set of centers of mass, 
of its extreme points. Since A„ is closed and bounded, hence compact, it 
is permissible to apply the Krein-Milman theorem. 

Proof 

To begin with, it is immediate that the permutation matrices are ex- 
treme points of A„. From the Krein-Milman theorem, the proof amounts 
to showing that there is no other extreme point in A„. 

Let M G An be given. If M is not a permutation matrix, there exists 
an entry G (0,1). Since M is stochastic, there also exists j 2 yf ji 

such that ^ (0)1)- Since M'^ is stochastic, there exists i 2 yf ii 

such that mi 2 j 2 G (0,1)- By this procedure one constructs a sequence 
j 2 ,i 2 , ■ ■ ■) such that G (0,1) and G (0,1). Since the 

set of indices is finite, it eventually happens that one of the indices (a row 
index or a column index) is repeated. 

Therefore, one can assume that the sequence (jf, ii, ■ ■ ■ ,jr, ir,jr+i = ji) 
has the above property. Let us define a matrix B G M„(1R) by = 1, 
^ij = 0 otherwise. By construction. Be = 0 and e^B = 0. If 
a G M, one therefore has (M ± aB)e = e and e^{M ± aB) = e^ . If a > 0 
is small enough, M ± aB turns out to be nonnegative. Finally, M + aB 
and M — aB are bistochastic, and 

M= ^{M-aB) + ^{M + aB). 

Hence M is not an extreme point of A„. 

■ 

Here is a nontrivial consequence (Stoer and Witzgall [32]): 

Corollary 5.5.1 Let || • || be a norm on iR”, invariant under permutation 
of the coordinates. Then ||M|| = 1 for every bistochastic matrix (where by 
abuse of notation we have used || • || for the induced norm on M„(iR)J. 

Proof 

To begin with, ||P|| = 1 for every permutation matrix, by assumption. 
Since the induced norm is convex (true for every norm), one deduces 
from Birkhoff’s theorem that ||M|| < 1 for every bistochastic matrix. 
Furthermore, Me = e implies ||M|| > ||Me||/||e|| = 1. 

■ 

This result applies, for instance, to the norm || • ||p, providing a nontrivial 
convex set on which the map 1/p i— > log ||AF||p is constant (compare with 
Theorem 4.3.1). 

The bistochastic matrices are intimately related to the relation ^ (see 
Section 3.4). In fact, we have the following theorem. 
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Theorem 5 . 5.2 A matrix A is bistochastic if and only if Ax >- x for every 

X e K”. 

Proof 

If A is bistochastic, then 

Px||i<||A||i||x||i = ||x||i, 

since is stochastic. Since A is stochastic, Ae = e. Applying this inequal- 
ity to a; — te, one therefore has ||Aa: — te||i < ||a; — te||i. Proposition 3.4.1 
then shows that x -< Ax. 

Conversely, let us assume that x -< Ax for every x G IR". Choosing x as 
the jth vector of the canonical basis, e-1, the inequality si(e-l) < si(Ae-l) 
expresses that A is a nonnegative matrix, while s„(e^) = s„(Ae^) yields 

n 

E«b = l- (5-2) 

i^l 

One then chooses x = e. The inequality si(e) < si(Ae) expresses^ that 
Ae > e. Finally, s„(e) = s„(Ae) and Ae > e give Ae = e. Hence, A is 
bistochastic. 

■ 

This statement is completed by the following. 

Theorem 5 . 5.3 Let x,y G IR". Then x ^ y if and only if there exists a 
bistochastic matrix A such that y = Ax. 

Proof 

From the previous theorem, it is enough to show that if x ^ y, there exists 
A, a bistochastic matrix, such that y = Ax. To do so, one applies Theorem 
3.4.2: There exists a Hermitian matrix H whose diagonal and spectrum 
are y and x, respectively. Let us diagonalize by a unitary conjugation: 
H = U*DU, with D = diag(xi, . . . ,x„). Then y = Ax, where = |uy p. 
Since U is unitary, A is bistochastic.^ 

■ 

An important aspect of stochastic matrices is their action on the simplex 




It is clear that is stochastic if and only if M(A1„) is contained in AT„; 
M is bistochastic if, moreover. Me = e. 

Considered as a part of the affine subspace whose equation is ^^Xi = 1, 
Kn is a convex set with a nonempty interior. Its interior comprises those 
points that satisfy x > 0. One denotes 9AT„ the boundary of AT„. If x G Kn, 



^For another vector y, si(y) < si(Ay) does not imply Ay > y. 
^This kind of bistochastic matrix is called orthostochastic. 
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we denote by 0(x) the set of indices i such that Xi = 0, and by o(x) 
its cardinality, in such a way that dKn comprises those points satisfying 
o{x) > 1. One always has = 0 for {i,j) G 0{Mx) x 0(x)°, where 
denotes the complement of / in {1, . . . ,n\. 

Proposition 5.5.1 Let x G Kn and M G A„ be given. Then one has 

o{Mx) < o{x). 

Moreover, if o{Mx) = o(x), one has rriij = 0 for every (i,j) G 0{MxY x 
0{x). 

Proof 

Let us compute 

n n 

o{x) - o{Mx) = ^ ^ Y 

i=l 0{x) 0{Mx) j—1 0{Mx)^xO{x) 

The case of equality is immediate. 

■ 

We could have obtained the first part of the proposition by applying 
Theorem 5.5.2. 

Corollary 5.5.2 Let L and J he two subsets o/{l, . . . ,n} and let M G A„ 
he a matrix satisfying niij = 0 for every (i,j) G / x J'^. Then one has 
|<^| > |-f|- If moreover, |/| = \ J\, then mij also vanishes for (i,j) G 1° x J. 

Proof 

It is sufficient to choose x G iL" with = 0(x) if J is nonempty. If J 
is empty, the statement is obvious. 

■ 

We shall denote by SA„ {S for strict) the set of doubly stochastic matri- 
ces M for which the conditions |/| = | J| and Wy = 0 for every (i, j) G LxJ‘^ 
imply either l = 0or/={l,...,n}. These are also the matrices for which 
X G dKn implies o{Mx) < o{x). This set does not contain permutation 
matrices P, since these satisfy o{Px) = o{x) for every x G iG„. 

Let M G A„ be given. A decomposition of M consists of two partitions 
/i U • • • U A and Ji U • • • U Jr of the set {!,... , n} such that 

{i G Ii, j G Jmi I Y rriij — 0- 

From Corollary 5.5.2, we have \Ii\ = \.Ji\ for every 1. Eliminating empty 
parts if necessary, we can always assume that none of the Lfs or Jfs is 
empty. A decomposition of M furnishes a block structure, in which each 
row-block has only one nonzero block, and the same for the column-blocks. 
The blocks of indices /; x Ji are themselves stochastic matrices. A matrix 
of SA„ admits only the trivial decomposition r = 1, /i = Ji = {1, . . . , n}. 

If M admits two decompositions, one with the sets h,Ji, I < I < r, the 
other one with I), J), 1 < I < s, let us form the partitions U^rnl'i'm 
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with -= 11 ^ 1 '^ and J['^ := Ji n j;,. If t e and j e J",, 
with {l,m) ^ (p,q), we have rriij = 0. From Corollary 5.5.2, applied to 
M and to its transposition, we have |/;^| = Eliminating the empty 

parts, we obtain therefore a decomposition of M that is finer than the first 
two, in the sense of inclusion order: Each Ii (or J() is a union of some parts 
of the form 

Since the set of decompositions of M is finite, the previous argument 
shows that there exists a finest one. We shall call it the canonical decom- 
position of M. It is the only decomposition for which the blocks of indices 
I I X Ji are themselves of class SA. 



5.6 Exercises 

1. We consider the following three properties for a matrix M G M„(IR). 

PI M is nonnegative. 

P2 M'^e = e, where e = (1, . . . , 1)^. 

P3 ||M||i < 1. 

(a) Show that P2 and P3 imply PI. 

(b) Show that P2 and PI imply P3. 

(c) Does PI and P3 imply P2? 

2. Here is another proof of the simplicity of p(A) in the Perron- 
Frobenius theorem, which does not require Lemma 5.3.3. 

(a) We assume that A is irreducible and nonnegative, and we denote 
by X a positive eigenvector associated to the eigenvalue p(A) . Let 
K be the set of nonnegative eigenvectors y associated to p(A) 
such that ||?/||i = 1. Show that K is compact and convex. 

(b) Show that the geometric multiplicity of p(A) equals 1 (Hint: 
Otherwise, K would contain a vector with at least one zero 
component.) 

(c) Show that the algebraic multiplicity of p(A) equals 1 (Hint: 
Otherwise, there would be a nonnegative vector y such that Ay— 
p(A)y = X >0.) 

3. Let M G Mn(M) be either a strictly diagonally dominant, or an 
irreducible strongly diagonally dominant, matrix. Assume that mjj > 
0 for every j = 1,... ,n and m-ij < 0 otherwise. Show that M is 
invertible and that the solution of Mx = b, when 5 > 0, satisfies 
X > 0. Deduce that M~^ > 0. 

4. Here is another proof of Theorem 5.3.1, due to Perron himself. We 
proceed by induction on the size n of the matrix. The statement is 
obvious if n = 1. We therefore assume that it holds for matrices 
of size n. We give ourselves an irreducible nonnegative matrix A G 
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M n+i{R), which we decompose blockwise as 

aGM, BgM„(JR). 

(a) Applying the induction hypothesis to the matrix B + e J, where 
e > 0 and J > 0 is a matrix, then letting e go to zero, show 
that p{B) is an eigenvalue of B, associated to a nonnegative 
eigenvector (this avoids the use of Theorem 5.2.1). 

(b) Using the formula 

OO 

fc=i 

valid for A G (p(i?),+oo), deduce that the function h{\) := 
A — a — {XI n — B)~^rj is strictly increasing on this interval and 

that on the same interval the vector x(X) := (XIn — B)~^r] is 
positive. 

(c) Prove the relation Pa{X) = Pb(A)/i(A) between the characteris- 
tic polynomials. 

(d) Deduce that the matrix A has one and only one eigenvalue in 
(p{B),+oo), and that it is a simple one, associated to a positive 
eigenvector. One denotes this eigenvalue by Aq. 

(e) Applying the previous results to A^, show that there exists £ G 
IR" such that £ > 0 and £'^{A — Ao/n) = 0. 

(f) Let p be an eigenvalue of A, associated to an eigenvector X. 
Show that (Ao — |^|)£^|A| > 0. Conclusion? 

5. Let A G Mn(JR) be a matrix satisfying > 0 for every pair (z, j) of 
distinct indices. 



(a) Using the Exercise 3, show that 

E(h;A) := (/„ - hA)~^ > 0, 



for h > 0 small enough. 

(b) Deduce that exp(tA) > 0 for every t > 0 (the exponential of 
matrices is presented in Chapter 7). Consider Trotter’s formula 



exptA= lim i?(t/m;A)’”, 

m — »-+oo 



where exp is the exponential of square matrices, defined in 
Chapter 7. Trotter’s formula is justified by the convergence (see 
Exercise 10 in Chapter 7) of the implicit Euler method for the 
differential equation 




(5.3) 



(c) Deduce that if x(0) > 0, then the solution of (5.3) is nonnegative 
for every nonnegative t. 
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(d) Deduce also that 

u := sup{3?A; A S Sp A} 
is an eigenvalue of A. 

6. Let A € Mn(JR) be a matrix satisfying > 0 for every pair (i,j) of 
distinct indices. 

(a) Let us define 

(T := sup{3?A; A € Sp A}. 

Among the eigenvalues of A whose real parts equal cr, let us 
denote by fx the one with the largest imaginary part. Show that 
for every positive large enough real number r, p{A + rln) = 
\p + t\. 

(b) Deduce that p = a = p{A) (apply Theorem 5.2.1). 

7. Let B G M„(1R) be a matrix whose off-diagonal entries are positive 
and such that the eigenvalues have strictly negative real parts. Show 
that there exists a nonnegative diagonal matrix D such that B' := 
D~^BD is strictly diagonally dominant, namely, 

8. Let B e M„(1R) be a nonnegative matrix and 



A:= 




(a) If an eigenvalue A of A is associated to a positive eigenvector, 
show that there exists p > X and Z > 0 such that BZ > pZ. 
Deduce that A < p{B). 

(b) Deduce that A admits no strictly positive eigenvector (first of 
all, apply Theorem 5.2.1 to the matrix A^). 

9. (a) Let B G M„(1R) be given, with p{B) = 1. Assume that the 

eigenvalues of B of modulus one are (algebraically) simple. Show 
that the sequence (S"‘)m>i is bounded. 

(b) Let M G M„(1R) be a nonnegative irreducible matrix, with 
p{M) = 1. We denote by x and the left and right eigenvectors 
for the eigenvalue 1 {Mx = x and M = y^), normalized by 
= 1. We define L := xy'^ and B = M — L. 

i. Verify that B — In is invertible. Determine the spectrum and 
the invariant subspaces of B by means of those of M. 

ii. Show that the sequence {B'^)m>i is bounded. Express 
in terms of S’”. 
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iii. Deduce that 



N-l 



lim 

N — >+oo N 






m—0 



= L. 



iv. Under what additional assumption do we have the stronger 
convergence 

lim = L7 

N — »- + oo 



10. Let B G M„(1R) be a nonnegative irreducible matrix and let C G 
M„(iR) be a nonzero nonnegative matrix. For t > 0, we define rt '■= 
p{B + tC) and we let Xt denote the nonnegative unitary eigenvector 
associated to the eigenvalue r*. 

(a) Show that 1 rt is strictly increasing. 

Define r := limj^+oo rt- We wish to show that r = +oo. Let X 
be a cluster point of the sequence Xt- We may assume, up to a 
permutation of the indices, that 




(b) Suppose that in fact, r < +oo. Show that BX < rX. Deduce 
that B'Y = 0, where B' is a matrix extracted from B. 

(c) Deduce that X = Y; that is, X > 0. 

(d) Show, finally, that CX = 0. Conclude that r = +oo. 

(e) Assume, moreover, that p{B) < 1. Show that there exists one 
and only one t G M such that p{B + tC) = 1. 



11. Show that A is stable under multiplication. In particular, if M is 
bistochastic, the sequence is bounded. 



12. Let M G Mn(M) be a bistochastic irreducible matrix. Show that 



lim 



N — »-+oo 



1 

JV 



N-l 

m—0 



... : 



... 1 



— ■ Jn 



(use Exercise 9). Show by an example that the sequence 
may or may not converge. 



13. Show directly that for every p G [l,oo], ||Jn||p = !> where J„ was 
defined in the previous exercise. 

14. Let P G GL„(1R) be given such that P, P~^ G A„. Show that P is a 
permutation matrix. 



15. If M G A„ is given, we define an equivalence relation between in- 
dices in the following way: i'Tli" if there exists a sequence U = 
i' , ji,i 2 , j 2 , ■ ■ ■ ,ip = such that iriij > 0 each time that {i,j) is 
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of the form (ii,ji) or jf) (compare with the proof of Theorem 
5.5.1). Show that in the canonical decomposition of M, the /; are the 
equivalence classes of TZ. 

Deduce that the following matrix belongs to SA„: 

/ 1/2 1/2 0 ••• 0 \ 

1/2 0 1/2 ■■. : 

0 1/2 ■■. ■■. 0 

: ■■. ■■. 0 1/2 

V 0 ••• 0 1 / 21 / 2 / 

16. Let M G SA„ and M' G A„ be given. Show that MM\ M'M G SA„. 

17. If M G SA„, show that limjv^+oo exists. 

18. Consider the induced norm || • |jp on M„(C'). Let M be a bistochastic 
matrix. 

(a) Compute ||M|ji and ||M||oo. 

(b) Show that ||M|| > 1 for every induced norm. 

(c) Deduce from Theorem 4.3.1 that ||M||p = 1. To what extent is 
this result different from Corollary 5.5.1? 

19. Suppose that we are given three real symmetric matrices (or 
Hermitian matrices) A, B, C = A + B. 

(a) If t G [0, 1] consider the matrix S'(t) := A + tB, so that S'(O) = A 
and S'(l) = C. Arrange the eigenvalues of S{t) in increasing 
order Ai(t) < • • • < A„(t). For each value of t there exists an 
orthonormal eigenbasis {Xi{t), . . . , A„(t)}. We admit the fact 
that it can be chosen continuously with respect to t, so that 
1 Xj(t) is continuous with a piecewise continuous derivative. 
Show that Af(t) = {BXj{t),Xj{t)). 

(b) Let {j = be the eigenvalues of A,B,C, 

respectively. Deduce from part (a) that 

Ij -otj= [ (BXj{t),Xj{t)) dt. 

Jo 

(c) Let {Yi,... ,Yn} be an orthonormal eigenbasis, relative to B. 
Define 

<^]k-= f \{Xj{t),Yk)\^dt. 

Jo 

Show that the matrix S := {<Jjk)i<j,k<n is bistochastic. 

(d) Show that — aj = ajkPk- Deduce (Lidskii’s theorem) that 
the vector (71 — ai, . . . , 7 « — On) belongs to the convex hull of 
the vectors obtained from the vector (/?i, . . . , /?„) by all possible 
permutations of the coordinates. 
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20. Let a G iR" be given, a = (oi, . . . , a„). 

(a) Show that 

C(a) :={be ]R^\by a} 

is a convex compact set. Characterize its extremal points. 

(b) Show that 

V(a) := {M G Sym„(iR) | Sp M a} 

is a convex compact set. Characterize its extremal points. 

(c) Deduce that Y (a) is the closed convex hull (actually the convex 
hull) of the set 

X{a) := {M G Sym„(fR) | Sp M = a}. 

(d) Set a = Sn(a)/n and a' := (a, . . . , a). Show that a' G C{a), and 
that b G C{a) b -< a'. 

(e) Characterize the set 

{M G Sym„(fR) | Sp M ^ a'}. 
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Matrices with Entries in a Principal 
Ideal Domain; Jordan Reduction 



6.1 Rings, Principal Ideal Domains 

In this Chapter we consider commutative integral domains A (see Chapter 
2). In particular, such a ring A can be embeded in its field of fractions, which 
is the quotient of ^ x (A \ {0}) by the equivalence relation (a, b)TZ{c, d) 
ad = he. The embedding is the map a (a, 1). In a ring A the set of 
invertible elements is denoted hy A* . If a,b G A are such that b = ua with 
u G A*, we say that a and b are associated, and we write a ^ b, which 
amounts to saying that aA = bA. If there exists c G A such that ac = 6, 
we say that a divides b and write a\b. Then the quotient c is unique and 
is denoted by b/a. We say that 6 is a prime, or irreducible, element if the 
equality b = ac implies that one of the factors is invertible. 

An ideal / in a ring A is an additive subgroup of A such that A ■ I Cl: 
a G A, X G I imply ax G I. For example, iib G A, the subset bA is an ideal, 
denoted by (6). Ideals of the form (b) are called principal ideals. 



6.1.1 Facts About Principal Ideal Domains 

Definition 6.1.1 A commutative ring A is a principal ideal domain if 
every ideal in A is principal: For every ideal X there exists a G A such that 
X={a). 

A field is a principal ideal domain that has only two ideals, (0) and (1). 
The set Z of rational integers and the polynomial algebra over a field k, 
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denoted by /c[X], are also principal ideal domains. More generally, every 
Euclidean domain is a principal ideal domain (see Proposition 6.1.3 below). 

In a commutative integral domain one says that d is a greatest common 
divisor (gcd) of a and 6 if d divides a and b, and if every common divisor 
of a and b divides d. In other words, the set of common divisors of a and 
b admits d a,s a, greatest element. The gcd of a and b, whenever it exists, 
is unique up to multiplication by an invertible element. We say that a 
and b are coprime if all their common divisors are invertible; in that case, 
gcd(a,&) = 1. 

Proposition 6.1.1 In a principal ideal domain, every pair of elements has 
a greatest common divisor. The gcd satisfies the Bezout identity: For every 
a,b € A, there exist u,v G A such that 

gcd(a, b) = ua + vb. 

Such u and v are coprime. 

Proof 

Let A be a principal ideal domain. If a, 6 € A, the ideal T =: (a, b) 
spanned by a and b, which is the set of elements of the form xa + yb, 
x,y G A, is principal: X = (d), where d = gcd(a, b). Since a,b gT, d divides 
a and b. Furthermore, d = ua + vb because d G X. If c divides a and b, then 
c divides ua + vb; hence divides d, which happens to be a gcd of a and b. 

If m divides u and v, then mdjua + vb; hence d = smd. If d yf 0, one has 
sm = 1, which means that m G A*. Thus u and v are coprime. If d = 0, 
then a = b = 0, and one may take u = v = 1, which are coprime. 

■ 

Let us remark that a gcd of a and 6 is a generator of the ideal aA + bA. It 
is thus nonunique. Every element associated to a gcd of a and b is another 
gcd. In certain rings one can choose the gcd in a canonical way, such as 
being positive in or monic in k[X\. 

The gcd is associative: gcd(a, gcd(&, c)) = gcd(gcd(a, 6), c). It is therefore 
possible to speak of the gcd of an arbitrary finite subset of A. In the above 
example we denote it by gcd(a, 6, c). At our disposal is a generalized Bezout 
formula: There exist elements u\, . . . ,Ur G A such that 

gcd(oi, . . . , Ur) = aiUi + • • • + UrUr. 

Definition 6.1.2 A ring A is Noetherian if every nondecreasing (for in- 
clusion) sequence of ideals is constant beyond some index: Iq C Ii C • • • C 
Ijn C ■ ■ ■ implies that there is an I such that Ii = d;+i = • • • . 

Proposition 6.1.2 The principal ideal domains are Noetherian. 

Observe that in the case of principal ideal domains the Noetherian property 
means exactly that if a sequence oi, . . . of elements of A is such that every 
element is divisible by the next one, then there exists an index J such that 
the Oj’s are pairwise associated for every j > J. 
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This property seems natural because it is shared by all the rings encoun- 
tered in number theory. But the ring of entire holomorphic functions is not 
Noetherian: Just take for a„ the function 

/ n \ 

2 ; I— > I (2 — ] sin 27T2;. 

\fc=i / 

Proof 

Let A be a principal ideal domain and let {Ij)j>o be a nondecreasing 
sequence of ideals in A. Let T be their union. This sequence is nondecreasing 
under inclusion, so that T is an ideal. Let a be a generator: X = (a). Then 
a belongs to one of the ideals, say a G Ik- Hence I C Ik, which implies 
Ij =I for j > k. 



We remark that the proof works with slight changes if we know that 
every ideal in A is spanned by a finite set. For example, the ring of poly- 
nomials over a Noetherian ring is itself Noetherian: Z[X] and k[X, Y] are 
Noetherian rings. 

The principal ideal domains are also factorial (a short term for unique 
factorization domain)-. Every element of A admits a factorization consist- 
ing of prime factors. This factorization is unique up to ambiguities, which 
may be of three types: the order of factors, the presence of invertible ele- 
ments, and the replacement of factors by associated ones. This property is 
fundamental to the arithmetic in A. 



6.1.2 Euclidean Domains 

Definition 6.1.3 A Euclidean domain is a ring A endowed with a map 
N : A 1 -^ IN such that for every a,b G A with b ^ 0, there exists a unique 
pair (q,r) G A x A such that a = qb + r with N{r) < N{b) (Euclidean 
division). 

A special case of Euclidean division occurs when b divides a. Then r = 0 
and we conclude that N{b) > N{0) for every 6 yf 0. 

Classical examples of Euclidean domains are the ring of the rational 
integers with N{a) = |a|, the ring k[X] of polynomials over a field 
k, with N{P) = and the ring of Gaussian integers with 

N(z) = |zp. Observe that if b is nonzero, the Euclidean division of b by 
itself shows that N{b) is positive. The function N is often called a norm, 
though it does not resemble the norm on a real or complex vector space. In 
practice, one may define iV(0) in a consistent way by 0 if 6 yf 0 N{b) > 0 
(case of Z and and by —00 otherwise (case of k[X]). With that 



^One may take either N(P) = 1 + degP if P is nonzero, and N(0) = 0. 
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extension, the pair (q, r) in the definition is uniquely defined hy a = bq + r 
and N{r) < N{b). 

Proposition 6.1.3 Euclidean domains are principal ideal domains. 

Proof 

Let X be an ideal of a Euclidean domain A.\iX = (0), there is nothing 
to show. Otherwise, let us select in I \ {0} an element a of minimal norm. 
If 5 G I, the remainder r of the Euclidean division of 6 by a is an element 
of I and satisfies N(r) < N{a). The minimality of N{a) implies r = 0, that 
is, a\b. Finally, X = (a). 

■ 

The converse of Proposition 6.1.3 is not true. For example, the quadratic 
ring Z[\/1A] is Euclidean, though not a principal ideal domain. More infor- 
mation about rings of quadratic integers can be found in Cohn’s monograph 
[ 10 ]. 

6.1.3 Elementary Matriees 

An elementary matrix of order n is a matrix of one of the following forms: 

• The transposition matrices: If cr G Sn, the matrix has entries 
Pij = where 5 is the Kronecker symbol. 

• The matrices /„ -I- aJik, for a G A and 1 < i yf /c < n, with 

{Ek)lm = Sl6^. 



• The diagonal invertible matrices, that is, those whose diagonal entries 
are invertible in A. 

We observe that the inverse of an elementary matrix is again elementary. 
For example, (/„ -I- aAfc)(/„ - aJik) = E- 

Theorem 6.1.1 A square invertible matrix of size n with entries in a 
Euclidean domain A is a product of elementary matrices with entries in 
A. 



Proof 

We shall prove the theorem for n = 2. The general case will be deduced 
from that particular one and from the proof of Theorem 6.2.1 below, since 
the matrices used in that proof are block-diagonal with 1x1 and 2x2 
diagonal blocks. 

Let 



M = 



a ai 
c d 
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be given in SL 2 (A): we have ad — a\c € A*. If N{a) < iV(ai), we multiply 
M on the right by 




We are now in the case N{ai) < N{a). Let a = a\q + 02 be the Euclidean 
division of a by oi. Then 



M 



1 0 
-q 1 



Next, we have 




Oi 

d 



M' 



0 1 
1 0 



=: Ml 



ai 02 ^ 



with iV(o 2 ) < N{a\). We thus construct a sequence of matrices Mk of the 
form 

/ Ofc_i Ofc \ 

V • • r 



with Ofc_i yf 0, each one the product of the previous one by elementary 
matrices. Furthermore, N{ak) < N{ak-i)- From Proposition 6.1.2, this 
sequence is finite, and there is a step for which ak = 0. The matrix Mk, 
being triangular and invertible, has an invertible diagonal D. Then MkD~^ 
has the form 




which is an elementary matrix. 



Again, the statement is false in a general principal ideal domain. Whether 
GL„(A) equals the group spanned by elementary matrices is a difficult 
question of Ktheory. 



6.2 Invariant Factors of a Matrix 

Theorem 6.2.1 Let M e M„xm(A) be a matrix with entries in a principal 
ideal domain. Then there exist two invertible matrices P € GL„(A), Q e 
GLm(A) and a quasi- diagonal matrix D G M„xm(A) (that is, dij = 0 for 
i ^ j) such that: 

• on the one hand, M = PDQ, 

• on the other hand, d\\d 2 , ■ ■ ■ ,di\di+i, . . . , where the dj are the 
diagonal entries of D. 
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Furthermore, if M = P'D'Q' is another decomposition with these two prop- 
erties, the scalars dj and d' are associated. Up to invertible elements, they 
are thus unique. 

Definition 6.2.1 For this reason, the scalars d\,... ,dr (r = min(n,m)j 
are called the invariant factors of M . 

Proof 

Uniqueness: for k < r, let us denote by Dk{N) the gcd of minors of order 
k of the matrix N. From Corollary 2.1.1, we have Dk{M) = Dk{D) = 
Dk{D'). It is immediate that Dk{D) = di ■ ■ ■ dk (because the minors 
of order k are either null, or products of k terms dj with distinct 
subscripts), so that 

d\ - • • dk = Ukd'i • • • d'f., 1 < A: < r, 

for some Uk G A*. Hence, di and d'^ are associated. Since A is an 
integral domain, we also have d). = uf^Uk-idk. In other words, dk 
and d'j. are associated. 

Existence: We see from the above that the dj’s are determined by the 
equalities di ■ ■ ■ dj = Dj{M). In particular, di is the gcd of the entries 
of M. Hence the first step consists in finding a matrix M', equivalent 
to M, such that is equal to this gcd. 

To do so, we construct a sequence of equivalent matrices M^p\ with = 
M, such that divides Given the matrix N := we 

distinguish four cases: 

1. nil divides nn,... but does not divide nij. Then d := 

gcd(nii,nij) reads d = unu + vnij. Let us define w := —nij/d 
and z := nu/d and let us define a matrix Q G Ghm{A) by: 

• qii = u, qji = V, qij = w, qjj = z, 

• qki = otherwise. 

Then := M^p~^'>Q is suitable, because m'fi = d|nn = 

2. nil divides each ny, as well as nn, . . . ,ni_ip, but does not divide 
nil. This case is symmetric to the previous one. Multiplication on 
the right by a suitable P G GL„(H) furnishes M^p\ with = 
gcd(nii,nii)|m^^“^\ 

3. nil divides each ny and each nn, but does not divide some ny with 
i,j > 2. Then nii = ann. Let us define a matrix P G GL„(H) by 

• pii = a + 1, pii = 1, pii = -1, Pa = 0; 

• Pki = otherwise; 

If we then set N' = PN , we have n'n = nn and n'y = (a+I)nij — ny . 
We have thus returned to the first case, and there exists an equiv- 
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4 . 



alent matrix with rnfi 






gcd(nii,nij)|nii = 



nil divides all the entries of the matrix N . In that case, := 



irp\ 

It is essential to observe that in the first three cases, is not associated 
to though it divides it. 

From Proposition 6.1.2, the elements of the sequence ( ) are pair- 

V / p>o 

wise associated, once p is large enough. We are then in the last of the 
four cases above: divides all the m|®^’s. We have = aim[f and 

~ ■ Then let P G GL„(A) and Q G GLm{A) be the matrices 

defined by: 



• Pa = 1, Pii = —cii if i > 2, pij = 0 otherwise, 

• Qjj = 1, qij = —bj if j > 2, Qij = 0 otherwise. 

The matrix M' := is equivalent to M^‘^\ hence to M. It has the 

form 

f m 0 • • • 0 \ 

0 

M' = 

: M" 

V 0 / 

where m divides all the entries of M" . Obviously, m = Di{M') = Di{M). 

Having shown that every matrix M is equivalent to a matrix of the form 
described above, one may argue by induction on the size of M (that is, 
on the integer r = min(n, m)). If r = 1, we have just proved the claim. 
If r > 2 and if the claim is true up to the order r — 1, we apply the 
induction hypothesis to the factor M” G M(„_i)x(m-i)(^) in the above 
reduction: there exist P" G GL„_i(H) and Q” G GLm-i(^) such that 
P"M"Q" is quasi-diagonal, with diagonal entries (I 2 , ■ ■ ■ ,dr ordered by 
di\d[+i for I > 2. From the uniqueness step, ^2 = Di{M"). Since m divides 
the entries of M", we have m\d 2 - Let us then define P' = diag(l,P") 
and Q' = diag(l, Q”), which are invertible: P' M'Q' is quasi-diagonal, with 
diagonal entries d\ = m,d 2 ,-.., a nondecreasing sequence (according to 
the division in A). Since M is equivalent to M', this proves the existence 
part of the theorem. 



6.2.1 Comments 

In the list of invariant factors of a matrix some dj’s may equal zero. In 
that case, dj = 0 implies dj+i = •■■ = dr = 0. Moreover, some invariant 
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factor may occur several times in the list di, . . . , dr, up to association. The 
number of times that a factor d or its associates occur is its multiplicity. 

If m = n and if the invariant factors of a matrix M are (1, • • • ,1), then 
D = In, and M = PQ is invertible. Conversely, if M is invertible, then the 
decomposition M = Minin shows that di = • • • = = 1. 

If ^ is a field, then there are only two ideals: A = (1) itself and (0). The 
list of invariant factors of a matrix is thus of the form (1, . . . ,1,0,... ,0). 
Of course, there may be no I’s (for the matrix Omxn), or no O’s. There are 
thus exactly min(n, m) + 1 classes of equivalent matrices in M„(A), two 
matrices being equivalent if and only if they have the same rank q. The rank 
is then the number of I’s among the invariant factors. The decomposition 
M = PDQ is then called the rank decomposition. 

Theorem 6.2.2 Let k he a field and M G Mnxm(k) a matrix. Let q he 
the rank of M , that is, the dimension of the linear suhspace of fc" spanned 
hy the columns of M. Then there exist two square invertible matrices P,Q 
such that M = PDQ with da = 1 if i < q and dij = 0 in all other cases. 



6.3 Similarity Invariants and Jordan Reduction 

From now on, k will denote a field and A = k[X] the ring of polynomi- 
als over k. This ring is Euclidean, hence a principal ideal domain. In the 
sequel, the results are effective, in the sense that the normal forms that 
we define will be obtained by means of an algorithm that uses right or left 
multiplications by elementary matrices of M„(T), the computations being 
based upon the Euclidean division of polynomials. 

Given a matrix B G M„(fc) (a square matrix with constant entries, in the 
sense that they are not polynomials), we consider the matrix XIn — B G 
M„(A), where X is the indeterminate in A. 

Definition 6.3.r If B G M„(fc), the invariant factors of M := XIn — B 
are called invariant polynomials of B, or similarity invariants of B. 

This definition is justified by the following statement. 

Theorem 6.3.1 Two matrices in M„(A:) are similar if and only if 
they have the same list of invariant polynomials (counted with their 
multiplicities) . 

This theorem is a particular case of a more general one: 

Theorem 6.3.2 Let A^, A\, Bq, B\ he matrices in M„(fc), with Aq,Ai. 
Then the matrices XAq + Bq and XAi + B\ are equivalent (in M„(T) ) if 
and only if there exist G,H G GL„(fc) such that 



GAo = AiH, GBo = BiH. 
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When Aq = Ai = In, Theorem 6.3.2 tells that XIn — Bq and XIn — Bi 
are equivalent, namely that they have the same invariant polynomials, if 
there exists P € GL„(fc) such that PBq = B\P, which is the criterion 
given by Theorem 6.3.1. 

Proof 

We prove Theorem 6.3.2. The condition is clearly sufficient. 

Conversely, if XAq+Bq and XAi+Bi are equivalent, there exist matrices 
P,Q € Ghn{A), such that P{XAq + Bq) = (XAi + Bi)Q. Since Ai is 
invertible, one may perform Euclidean division^ of P hy X Ai + B\ on the 
right: 

P = {XAi + Bi)Pi + G, 

where G is a matrix whose entries are constant polynomials. We warn the 
reader that since M„(fc) is not commutative. Euclidean division may be 
done either on the right or on the left, with distinct quotients and distinct 
remainders. Likewise, we have Q = Qi(XAo + Bq) + H with H G M„(fc). 
Let us write, then, 

(XAi + Bi)(Pi - Qi)(XAo + Bo) = (XAi + Bi)H - G{XAo + Bo). 

The left-hand side of this equality has degree (the degree is defined as the 
supremum of the degrees of the entries of the matrix) 2 -|- deg(Pi — Qi), 
while the right-hand side has degree less than or equal to one. The two 
sides, being equal, must vanish, and we conclude that 

GAo = AiH, GBo = BiH. 

There remains to show that G and H are invertible. To do so, let us define 
R G M„(A) as the inverse matrix of P (which exists by assumption). We 
still have 

R= {XAo + Bo)Ri + K, KGMn(k). 

Combining the equalities stated above, we obtain 

In-GK= {XA^ + B^){QR^ + P^K). 

Since the left-hand side is constant and the right-hand side has degree 
1 -I- deg(Qi?i -I- PiK), we must have /„ = GK, so that G is invertible. 
Likewise, H is invertible. 

■ 

We conclude this paragraph with a remarkable statement: 

Theorem 6.3.3 If B G M„(fc), then B and are similar. 

Indeed, XR — B and XR — are transposes of each other, and hence 
have the same list of minors, hence the same invariant factors. 



^The fact that Ai is invertible is essential, since the ring Mn(A) is not an integral 
domain. 
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6.3.1 Example: The Companion Matrix of a Polynomial 

Given a polynomial 

P{X) = X” + aiX”-i + • • • + a„, 

there exists a matrix B e M„(fc) such that the list of invariant factors of 
the matrix XI„ — B is (1, . . • ,1, P). We may take the companion matrix 
associated to P to be 




Naturally, any matrix similar to Bp would do as well, because if = 
Q~^BpQ, then XIn — B is similar, hence equivalent, to XIn — Bp. In order 
to show that the invariant factors of Bp are the polynomials (1, . • . , 1, -P), 
we observe that XIn — Bp possesses a minor of order n — 1 that is invertible , 
namely, the determinant of the submatrix 




We thus have — Bp) = 1, so that the invariant factors 

di,. . . , dn-i are all equal to 1. Hence d„ = Dn{XIn — Bp) = det(X/„ — 
Bp), the characteristic polynomial of Bp, namely P. 

In this example P is also the minimal polynomial of Bp. In fact, if Q is 
a polynomial of degree less than or equal to n — 1, 

Q{X)=boX^~^ + --- + bn-l, 

the vector Q{A)e^ reads 

6oe" + • • • + bn-\C . 

Hence Q{A) = 0 and deg Q < n — 1 imply Q = 0. The minimal polynomial 
is thus of degree at least n. It is thus equal to the characteristic polynomial. 

6.3.2 First Canonieal Form of a Square Matrix 

Let M e M„(fc) be a square matrix and Pi , ... , P„ G k[X] its similarity 
invariants. The sum of their degrees nj (1 < j < n) is n. Let us denote 
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by G Mnj(k) the companion matrix of the polynomial Pj. Let us 

form the matrix M', block-diagonal, whose diagonal blocks are the M^-l^’s. 
The few first polynomials Pj are generally constant (we shall see below 
that the only case where Pi is not constant corresponds to M = «/„), and 
the corresponding blocks are empty, as are the corresponding rows and 
columns. To be precise, the actual number m of diagonal blocks is equal to 
the nuber of nonconstant similarity invariants. 

Since the matrix Xln^ — is equivalent to the matrix = 

diag(l, ... ,1, Pj), we have 

X/„. - 

where G GL„^(fc[X]). Let us form matrices P,Q G GL„(A:[Jf]) 

by 

P = diag(P«^.,. Q = diag(Q«^.,. ^Q(n)). 

We obtain 

XIn- M' = PNQ, iV = diag(iV(b,... ,7v(")). 

Here iV is a diagonal matrix, whose diagonal entries are the similarity 
invariants of M, up to the order. In fact, each nonconstant Pj appears 
in the associated block fV^). The other diagonal terms are the constant 
1, which occurs n — m times; these are the polynomials Pi, . . . , Pn-m, as 
expected. Conjugating by a permutation matrix, we obtain that XI„ — M' 
is equivalent to the matrix diag(Pi, ... ,Pn)- Hence XIn — M' is equivalent 
to XIn — M. From Theorem 6.3.1, M and M' are similar. 

Theorem 6.3.4 Let k he a field, M G M„(fc) a square matrix, and 
Pi,... ,P„ its similarity invariants. Then M is similar to the block- 
diagonal matrix M' whose jth diagonal block is the companion matrix of 

P,- 

The matrix M' is called the first canonical form of M , or the Frobenius 
canonical form of M . 

Remark: If L is an extension of k (namely, a field containing k) and M G 
M„(fc), then M G M„(P). Let Pi, . . . , P„ be the similarity invariants of M 
as a matrix with entries in k. Then XIn — M = Pdiag(Pi, . . . , Pn)Q, where 
P,Q G GL„(fc[X]). Since P, Q, their inverses, and the diagonal matrix also 
belong to M„(P[X]), Pi,... , P„ are the similarity invariants of M as a 
matrix with entries in L. In other words, the similarity invariants depend 
on M but not on the field k. To compute them, it is enough to place 
ourselves in the smallest possible field, namely that spanned by the entries 
of M. The same remark holds true for the first canonical form. As we shall 
see in the next section, it is no longer true for the second canonical form, 
which is therefore less canonical. 

We end this paragraph with a characterization of the minimal polyno- 
mial. 
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Theorem 6.3.5 Let k he a field, M G M„(fc) a square matrix, and 
Pi,... ,P„ its similarity invariants. Then Pn is the minimal polynomial 
of M . In particular, the minimal polynomial does not depend on the field 
under consideration, as long as it contains the entries of M. 

Proof 

We use the first canonical form M' of M. Since M' and M are similar, 
they have the same minimal polynomial. One thus can assume that M is 
in the canonical form M = diag(Mi, . . . , M„), where Mj is the companion 
matrix of Pj. Since Pj{Mj) = 0 (Cayley-Hamilton, theorem 2.5.1) and 
Pj\Pn, we have Pn{Mj) = 0 and thus Pn{M) = 0„. Hence, the minimal 
polynomial Qm divides P„. Conversely, Q{M) = 0„ implies Q{Mn) = 0. 
Since Pn is the minimal polynomial of Mn, Pn divides Q. Finally, P„ = Qm- 

Finally, since the similarity invariants do not depend on the choice of the 
field, Pn also does not depend on this choice. 

■ 

Warning'. One may draw an incorrect conclusion if one applies Theorem 
6.3.5 carelessly. Given a matrix M G M„(^), one can define a matrix 
in M„(^/p^) by reduction modulo p (p & prime number). But the 
minimal polynomial of is not necessarily the reduction modulo p of 
Qm- Here is an example: Let us take n = 2 and 




Then Qm divides Pm = {X — 2fi, but Qm X — 2, since M yf 2 I 2 - Hence 
Qm = {X — 2)^. On the other hand, M( 2 ) = 02> whose minimal polynomial 
is X, which is different from X“^, the reduction modulo 2 of Qm- 

The explanation of this phenomenon is the following. The matrices M 
and M( 2 ) are composed of scalars of different natures. There is no field 
L containing simultaneously Z and ’Zj2'Zi. There is thus no context in 
which Theorem 6.3.5 could be applied. 



6.3.3 Second Canonical Form of a Square Matrix 

We now decompose the similarity invariants of M into products of irre- 
ducible polynomials. This decomposition depends, of course, on the choice 
of the field of scalars. Denoting by pi, ... ,pt the list of distinct irreducible 
(in k[X]) factors of Pn, we have 

Pj = ^<j<n 

fc=i 

(because Pj divides P„), where the a(j, k) are nondecreasing with respect 
to j, since Pj divides Pj+\. 
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Definition 6.3.2 The elementary divisors of the matrix M G M„(fc) are 
the polynomials for which the exponent a{j, k) is nonzero. The mul- 

tiplicity of an elementary divisor p^ is the number of solutions j of the 
equation a{j, k) = m. The list of elementary divisors is the sequence of 
these polynomials, repeated with their multiplicities. 

Let us begin with the case of the companion matrix N of some polynomial 
P. Its similarity invariants are (1, . . . ,1, P) (see above). Let Qi,. . . , Qt be 
its elementary divisors (we observe that each has multiplicity one). We then 
have P = Qi ■ ■ ■ Qt, while the Qis are pairwise coprime. To each Qi we 
associate its companion matrix Ni, and we form a block-diagonal matrix 
N' := diag(A^i, . . . ,Nt). Since each Ni — XIi is equivalent to a diagonal 
matrix 




in M„(q(A:[Jf]), the whole matrix N' — XIn is equivalent to 




Let us now compute the similarity invariants of N', that is, the invariant 
factors of Q. It will be enough to compute the greatest common divisor 
Dn-i of the minors of size n — 1. Taking into account the principal minors 
of Q, we see that D„_i must divide every product of the form 



l<k<t. 

l^k 



Since the Qfs are pairwise coprime, this implies that Dn-i = 1. This 
means that the list of similarity invariants of N' has the form (1, . . . , 1, •)> 
where the last polynomial must be the characteristic polynomial of N'. 
This polynomial is the product of the characteristic polynomials of the 
Ni’s. These being equal to the Qi’s, the characteristic polynomial of N' is 
P. Finally, N and N' have the same similarity invariants and are therefore 
similar. 

Now let M be a general matrix in M„(fc). We apply the former reduction 
to every diagonal block Mj of its Frobenius canonical form. Each Mj is 
similar to a block-diagonal matrix whose diagonal blocks are companion 
matrices corresponding to the elementary divisors of M entering into the 




no 
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factorization of the jth invariant polynomial of M . We have thus proved 
the following statement. 

Theorem 6.3.6 Let Qi, . . . ,Qs be the elementary divisors of M G 
M„(fc). Then M is similar to a block-diagonal matrix M' whose diagonal 
blocks are companion matrices of the Qi ’s. 

The matrix M' is called the second canonical form of M. 

Remark: The exact computation of the second canonical form of a given 
matrix is impossible in general, in contrast to the case of the first form. 
Indeed, if there were an algorithmic construction, it would provide an algo- 
rithm for factorizing polynomials into irreducible factors via the formation 
of the companion matrix, a task known to be impossible if k = IR or C. 
Recall that one of the most important results in Galois theory, known as 
Abel’s theorem, states the impossibility of solving a general polynomial 
equation of degree at least five with complex coefficients, using only the 
basic operations and the extraction of roots of any order. 



6.3.4 Jordan Form of a Matrix 



When the characteristic polynomial splits over k, which holds, for instance, 
if the field k is algebraically closed, the elementary divisors have the form 
{X — aY for a G fc and r > 1. In that case, the second canonical form can 
be greatly simplified by replacing the companion matrix of the monomial 
{X — aY by its Jordan block 



J{a] r) 



/a 1 

0 ■■ 



0 



V 0 



• • 0 \ 

■. 0 

■. 1 

0 a J 



In fact, the characteristic polynomial of J(a; r) (of size r x r) is (A — aY, 
while the matrix XIr — J{a; r) possesses an invertible minor of order r — 1, 
namely 

/ -1 0 ••• 0 \ 

X-a 

V x-a 



“J 



which is obtained by deleting the first column and the last row. Again, this 
shows that — J) = 1, so that the invariant factors di, . . . , dr-i 

are equal to 1. Hence dr = Dr{XIr — J) = det{XIr — J) = {X — a)’’. Its 
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invariant factors are thus 1, . . . ,1, (X — ay. Hence we have the following 
theorem. 

Theorem 6.3.7 When an elementary divisor of M is {X — aY , one may, 
in the second canonical form of M , replace its companion matrix by the 
Jordan block J{a;r). 

Corollary 6.3.1 If the characteristic polynomial of M splits over k, then 
M is similar to a block- diagonal matrix whose jth diagonal block is a Jordan 
block J{aj-, Vj). This form is unique, up to the order of blocks. 

Corollary 6.3.2 If k is algebraically closed (for example if k = C), then 
every square matrix M is similar to a block-diagonal matrix whose jth 
diagonal block is a Jordan block J{aj;rj). This form is unique, up to the 
order of blocks. 



6.4 Exercises 

See also the exercise 12 in Chapter 7. 

1. Show that every principal ideal domain is a unique factorization 
domain. 

2. Verify that the characteristic polynomial of the companion matrix of 
a polynomial P is equal to P. 

3. Let k he a field and M G M„(A:). Show that M, have the same 
rank and that in general, the rank of M'^M is less than or equal 
to that of M. Show that the equality of these ranks always holds if 
k = IR, but that strict inequality is possible, for example with k = C. 

4. Compute the elementary divisors of the matrices 



/ 22 


23 


10 


-98 \ 




/ 


0 


-21 


-56 


-96 \ 


12 


18 


16 


-38 






18 


36 


52 


-8 


-15 


-19 


-13 


58 


5 




-12 


-17 


-16 


38 


V 6 


7 


4 


-25 ^ 




V 


3 


2 


-2 


-20 / 



and 

/ 44 89 120 -32 \ 

0 -12 -32 -56 

-14 -20 -16 49 

\ 8 14 16 -16 

in M„(Cf). What are their Jordan reductions? 



5. (Lagrange’s theorem) 
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6. Matrices with Entries in a Principal Ideal Domain; Jordan Reduction 



Let iL be a field and A e M„(iL). Let X,Y G iL" be vectors such 
that X"’" AY ^ 0. We normalize by X"’^ AY = 1 and define 

B := A - (AY) {X'^ A). 

Show that in the factorization 

FAQ = y P,Qe GL„(iL), 

one can choose Y as the first column of Q and X'^ as the first row of 
P. Deduce that rkS = rk A — 1. 

More generally, show that if X,Y G X^AY G GLm(-ff), 

and if 

B:=A- {AY){X'^AY)-\X^A), 

then rk _B = rk A — m. 

If ^ € Sym„(IR) and if A is positive semidefinite, and if X = Y , 
show that B is also positive semidefinite. 

6. For A G M„(C'), consider the linear differential equation in C" 



(a) 

(b) 



dx 

— = Ax. 
dt 



( 6 . 1 ) 



Let P G GL„(G) and let 1 x{t) be a solution of (6.1). What 
is the differential equation satisfied by f i— > Px{t)l 
Let {X — a)"^ be an elementary divisor of A. Show that for every 
k = 0,. . . ,m — 1, (6.1) possesses solutions of the form e°'*Qk{t), 
where Qk is a complex-valued polynomial map of degree k. 



7. Consider the following differential equation of order n in G: 

(t) + (t) + • • • + a„x{t) = 0. (6.2) 

(a) Define P{X) = X” -|- + ■ ■ ■ + an and let M be the 

companion matrix of P. Let 

p(x) = 

aeA 

be the factorization of P into irreducible factors. Compute the 
Jordan form of M . 

(b) Using either the previous exercise or arguing directly, show that 
the set of solutions of (6.2) is spanned by the solutions of the 
form 



t^e^*R{t), RgC[X], degP<no. 

8. Consider a linear recursion of order n in a field K 



'Um+n Y ^lUm+n—l 



n 



0, mGlN. (6.3) 
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With the notation of the previous exercise, show that the set of 
solutions of (6.3) is spanned by the solutions of the form 

(a’”i?(m))rnew, R&C[X], degR<na- 

9. Let n > 2 and let M G M„(^) be the matrix defined by = 
i + j - 1: 

2 ••• n \ 



2n-l J 

(a) Show that M has rank 2 (you may look for two vectors x,y G 
such that rriij = XiXj — yiyj). 

(b) Compute the invariant factors of M in M„(^) (the equivalent 
diagonal form is obtained after five elementary operations). 



/ 1 
2 

\ n 



10. The ground field is C. 
(a) Define 



7V = J(0;n), 



0 

V 1 0 



0 1 \ 

0 

■’ J 



Compute NB, BN, and BNB. Show that S := + iB) is 

unitary. 



(b) Deduce that N is similar to 

/Of 0 ... 0 \ 

1 ■■. ■■. ■■. : 

0 0 



1 



V 0 ... 0 1 0 / 



( 0 

0 

-1 

V 0 



1 



0 -1 0 \ 

1 

0 

0 ... 0 y 



(c) Deduce that every matrix M G M„(C') is similar to a complex 
symmetric matrix. Compare with the real case. 




7 

Exponential of a Matrix, Polar 
Decomposition, and Classical Groups 



7.1 The Polar Decomposition 

The polar decomposition of matrices is defined by analogy with that in the 
complex plane: If z € G* , there exists a unique pair (r, q) G (0, +oo) x 
(5”^ denotes the unit circle, the set of complex numbers of modulus 1) such 
that z = rq. If z acts on G (or on G*) by multiplication, this action can 
be decomposed as the product of a rotation of angle 9 (where q = exp(i9)) 
with a homothety of ratio r > 0. The fact that these two actions commute 
is a consequence of the commutativity of the multiplicative group G*; this 
property does not hold for the polar decomposition in GL„(fc), k = IR or 
G, because the general linear group is not commutative. 

Let us recall that HPD„ denotes the (open) cone of matrices of M„(C') 
that are Hermitian positive definite, while U„ denotes the group of unitary 
matrices. In M„(IR), SPD„ is the set of symmetric positive definite ma- 
trices, and 0„ is the orthogonal group. The group U„ is compact, since it 
is closed and bounded in M„(C). Indeed, the columns of unitary matrices 
are unit vectors, so that U„ is bounded. On the other hand, U„ is defined 
by an equation U*U = In, where the map U i->-U*Uis continuous; hence 
U„ is closed. By the same arguments, 0„ is compact. 

Polar decomposition is a fundamental tool in the theory of finite- 
dimensional Lie groups and Lie algebras. For this reason, it is intimately 
related to the exponential map. We shall not consider these two notions 
here in their full generality, but we shall restrict attention to their matricial 
aspects. 
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Theorem 7.1.1 For every M € GL„(C), there exists a unique pair 

{H, Q) e HPD„ X U„ 

such that M = HQ. If M G GL„(1R), then (H,Q) G SPD„ x 0„. 

The map M i— > (H,Q), called the polar decomposition of M, is a 
homeomorphism between GL„(G) and HPD„ x U„ (respectively between 
GL„(iR) and SPD„ x 0„j. 

Theorem 7.1.2 Let H be a positive definite Hermitian matrix. There ex- 
ists a unique positive definite Hermitian matrix h such that h? = H. If H 
is real-valued, then so is h. The matrix h is called the square root of H , 
and is denoted by h = \/H . 

Proof 

We prove Theorem 7.1.1 and obtain Theorem 7.1.2 as a by-product. 

Existence. Since MM* G HPD„, we can diagonalize MM* by a 
unitary matrix 

MM* = U*DU, H = diag(di,... ,d„), 

where dj G (0, -hoo). The matrix H := U* diag(-\/3i, ■ • . , \/^)U is Hermi- 
tian positive definite and satisfies H^ = HH* = MM*. Then Q := H~^M 
satisfies Q*Q = M*H~^M = M*{MM*)~^M = hence Q G U„. If 
M e M„(1R), then clearly MM* is real symmetric. In fact, U is orthogo- 
nal and H is real symmetric. Hence Q is real orthogonal. Note: H is called 
the square root of MM*. 

Uniqueness. Let M = H'Q' be another suitable decomposition. Then 
N := H~^H' = Q{Q')~^ is unitary, so that Sp(fV) C . Let S G HPD„ 
be a positive definite Hermitian square root of H' (we shall prove below 
that it is unique). Then N is similar to N' := SH~^S. However, N' G 
HPD„. Hence N is diagonalizable, with real positive eigenvalues. Hence 
Sp{N) = {!}, and N is therefore similar, and thus equal, to In- 

This proves that the positive definite Hermitian square root of a matrix 
of HPD„ is unique in HPD„, since otherwise, our construction would 
provide several polar decompositions. We have thus proved Theorem 7.1.2 
in passing. 

Smoothness. The map {H, Q) i— > HQ is polynomial, hence continuous. 
Conversely,, it is enough to prove that M i-^- {H, Q) is sequentially con- 
tinuous, since GL„(G) is a metric space. Let {Mk)kGiN be a convergent 
sequence in GL„(G) and let M be its limit. Let us denote by Mk = HkQk 
and M = HQ their respective polar decompositions. Let i? be a cluster 
point of the sequence {Qk)keiN, that is, the limit of some subsequence 
{Qk,)ieiN, with ki - 1 - 00 . Then = Mk,Ql^ converges to S := MR*. 
The matrix S is Hermitian positive semidefinite (because it is the limit 
of the Hkfs) and invertible (because it is the product of M and R*). It 
is thus positive definite. Hence, SR is a polar decomposition of M. The 
uniqueness part ensures that R = Q and S = H. The sequence {Qk)keiN, 
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which is relatively compact and has at most one cluster point (namely Q) , 
converges to Q. Finally, Hk = MkQl converges to MQ* = H. 

■ 

Remark: There is as well a polar decomposition M = QH with the same 
properties. We shall use one or the other depending on the context. We 
warn the reader, however, that for a given matrix, the two decompositions 
do not coincide. For example, in M = HQ, H is the square root of MM*, 
though in M = QH, it is the square root of M*M. 



7.2 Exponential of a Matrix 



The ground field is here fc = C. By restriction, we can also treat the case 
k = IR. 

For A in M„(C), the series 




converges normally (which means that the series of norms is convergent), 
since for any matrix norm, we have 



E 



k\' 



1 1 1 

= exp||7l||. 

k=0 



Since M„(C') is complete, the series is convergent, and the estimation above 
shows that it converges uniformly on every compact set. Its sum, denoted 
by exp A, thus defines a continuous map exp : M„(C) — > M„(C), called 
the exponential. When A G M„(iR), we have exp A G M„(iR). 

Given two matrices A and B in general position, the binomial formula 
is not valid: {A + B)^ does not necessarily coincide with 







It thus follows that exp(A + B) differs in general from exp A • expB. A 
correct statement is the following. 



Proposition 7.2.1 Let A,B G M„(C') he commuting matrices; that is, 
AB = BA. Then exp(A + B) = (exp A) (exp B) . 



Proof 

The proof proceeds in exactly the same way as for the exponential of 
complex numbers. We observe that since the series defining the expo- 
nential of a matrix is normally convergent, we may compute the product 
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(exp A) (exp i?) by multiplying term by term the series 

OO - 

(exp7l)(expi?)= ^ 



In other words, 



where 



(exp A) (exp B) = ^ -Ci, 



1=0 



j + k = l •> 



From the assumption AB = BA, we know that the binomial formula holds. 
Therefore, Ci = {A + B)^ , which proves the proposition. 



Noting that exp 0„ = In and that A and —A commute, we derive the 
following corollary. 

Corollary 7.2.1 For every A e M„(C), exp A is invertible, and its 
inverse is exp (—A). 

Given two conjugate matrices B = P~^AP, we have B^ = P~^A^P for 
each integer k and thus 

exp(p-^AP) = p-\expA)P. (7.1) 

If U = diag(di, . . . , dn) is diagonal, we have 

exp D = diag(exp di, . . . , exp d„). 

Of course, this formula, or more generally (7.1), can be combined with 
Jordan reduction in order to compute the exponential of a given matrix. 
Let us keep in mind, however, that Jordan reduction cannot be carried out 
explicitly. 

Let us introduce a real parameter t and let us define a function g by 
g{t) = exptA. From Proposition 7.2.1, we see that g satisfies the functional 
equation 



g{s + t) = g{s)g{t). 

On the other hand, ^(0) = and we have 



gW -g(o) 

t 



-^ = E 



k-l 



k=2 



k\ 



-A\ 



Using any matrix norm, we deduce that 



gW -g(o) 






< 



ell‘^11 _ 1 _ \\tA\\ 



(7.2) 
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from which we obtain 



lim 

t^o 



g{t) -g(o) 

t 



= A. 



We conclude that g has a derivative at t = 0, with g'(0) = A. Using the 
functional equation (7.2), we then obtain that g is differentiable everywhere, 
with 



g'{t) = lim 

s^O 



9{t)g{s) - g{t) 

S 



g{t)A. 



We observe that we also have 



g\t) = lini 

s^O 



g{s)g{t) - g{t) 

S 



Ag{t). 



From either of these differential equations we see that g is actually infinitely 
differentiable. We shall retain the formula 

-^exptA = AeyuptA = (expM)A. (7.3) 

at 

This differential equation is sometimes the most practical way to compute 
the exponential of a matrix. This is of particular relevance when A has real 
entries but has at least one nonreal eigenvalue if one wishes to avoid the 
use of complex numbers. 



Proposition 7.2.2 For every A e M„(C'), 

det exp A = exp Tr A. (7.4) 

Proof 

We could deduce (7.4) directly from (7.3). Here is a more elementary 
proof. We begin with a reduction of A of the form A = P~^TP, where T 
is upper triangular. Since T* is still triangular, with diagonal entries equal 
to t^p expT is triangular too, with diagonal entries equal to exptjj. Hence 

det exp T = exp tjj = exp tjj = exp Tr T. 

3 j 

This is the expected formula, since expH = P~^{expT)P. 



Since (M*)^ = (M^)*, we see easily that (expM)* = exp(M*). In 
particular, the exponential of a skew-Hermitian matrix is unitary, for then 

(exp M)* exp M = exp(M*) exp M = exp(— M) exp M = In- 



Similarly, the exponential of a Hermitian matrix is Hermitian positive 
definite, because 



exp M 



1 



1 



exp -M exp-M. 
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This calculation also shows that if M is Hermitian, then 

y^expM = exp -M. 

We shall use the following more precise statement: 

Proposition 7.2.3 The map exp : H„ — > HPD„ is a homeomorphism 
(that is, a bicontinuous hijection). 

Proof 

Injectivity: Let A,Bg H„ with exp A = expB =: H. Then 

1 ^ 1 
exp - A = VH = exp -B. 

By induction, we have 

exp2“’”7l = exp2“™i?, m€^. 

Substracting J„, multiplying by 2™, and passing to the limit as m ^ 
+ 00 , we obtain 

exptA= — exptB; 
t=o ™ t=o 

that is, A= B. 

Surjectivity: Let H G HPD„ be given. Then H = U* diag((ii, . . . , dn)U, 
where U is unitary and dj G (0,+oo). From above, we know that 
H = exp M for 

M := C/*diag(logdi, . . . ,logd„)[/, 
which is Hermitian. 

Continuity: The continuity of exp has already been proved. Let us in- 
vestigate the continuity of the reciprocal map. Let be a 

sequence in HPD„ that converges to H G HPD„. We denote by 
M\ M G H„, the Hermitian matrices whose exponentials are and 
H. The continuity of the spectral radius gives 

p(H^) = p(H), ]im p{iH^)-^)=p{{H)-^). (7.5) 

I — ^“|“00 I — ^“|“00 

Since Sp(M^) = logSp(M^), we have 

p(M') = logmax{p(ij'),p((i^')-')} • (7.6) 

Keeping in mind that the restriction to H„ of the induced norm || • ||2 
coincides with that of the spectral radius p, we deduce from (7.5, 7.6) 
that the sequence is bounded. If is a cluster point of the 

sequence, the continuity of the exponential implies exp N = H . But 
the injectivity shown above implies N = M. The sequence (M*)ig^v, 
bounded with a unique cluster point, is convergent. 



dt 
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7.3 Structure of Classical Groups 

Proposition 7.3.1 Let G he a subgroup o/GL„(C). We assume that G 
is stable under the map M M* and that for every M G GD HPD„, the 
square root y/M is an element ofG. Then G is stable under polar decompo- 
sition. Furthermore, polar decomposition is a homeomorphism between G 
and 

(Gnu„) X (GnHPD„). 

This proposition applies in particular to subgroups of GL„(iR) that are 
stable under transposition and under extraction of square roots in SPD„. 
One has then 

G (G n 0„) X (G n SPD„). 

Proof 

Let M G G he given and let HQ he its polar decomposition. Since 
MM* G G, we have G G, that is, H G G, hy assumption. Finally, we 
have Q = H~^M G G. An application of Theorem 7.1.1 finishes the proof. 

■ 

We apply this general result to the classical groups XJ{p,q), 0{p,q) 
(where n = p-\-q) and Sp^ (where n = 2m). These are respectively the uni- 
tary group of the Hermitian form |zip + - • --\-\Zp\^ — l^p+iP the 

orthogonal group of the quadratic form x\-\ — • + Xp — and 

the symplectic group. They are defined by G = {M G M„(fc)|M* JM = J}, 
with k = G for U(p, g), k = JR otherwise. The matrix J equals 

( Jp bpxg \ 

Ogxp —Iq J ’ 

for U(p, g) and 0{p,q), and 




for Sp,„. In each case, = ±/„. 

Proposition 7.3.2 Let J he a complex nx n matrix satisfying = ±In. 
The subgroup G o/M„(C) defined by the equation M* JM = J is invariant 
under polar decomposition. Lf M gG, then |detM| = 1. 

Proof 

The fact that G is a group is immediate. Let M G G. Then det J = 
det M* det Mdet J; that is, jdetMp = 1. Furthermore, M* J M{J M*) = 
J^M* = ±M* = M*,P. Simplifying by M*J on the left, there remains 
MJM* = J, that is, M* G G. 
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Observe that, since G is a group, M G G implies = JM~^ for 

every k € IN. By linearity, it follows that p{M*)J = Jp{M~^) holds for 
every polynomial p C IR[X\. 

Let us now assume that M G G O HPD„. We then have M = 
[/* diag((ii, . . . ,dn)U, where U is unitary and the dj's are positive real 
numbers. Let A be the set formed by the numbers dj and l/dj. There ex- 
ists a polynomial p with real entries such that p{a) = ^/a for every a G A. 
Then we have p{M) = ^/M and p{M~^) = \/M . Since M* = M, we 

have also p{M)J = Jp{M~^); that is, \/MJ = J\/M . Hence \/M G G. 
From Proposition 7.3.1, G is stable under polar decomposition. 

■ 

The main result of this section is the following: 

Theorem 7.3.1 Under the hypotheses of Proposition 7.3.2, the group G 
is homeomorphic to (G n U„) x for a suitable integer d. 

Of course, if G = 0(p, q) or Sp,„, the subgroup GnU„ can also be written 
as Gn 0„. We call GnU„ a maximal eompact subgroup of G, because one 
can prove that it is not a proper subgroup of a compact subgroup of G. 
Another deep result, which is beyond the scope of this book, is that every 
maximal compact subgroup of G is a conjugate of G n U„. In the sequel, 
when speaking about the maximal compact subgroup of G, we shall always 
have in mind G H U„ . 

Proof 

The proof amounts to showing that GnHPD„ is homeomorphic to some 
1R‘^. To do this, we define 

g := {N G M„(fc)| exptiV G G, Vt G IR}. 

Lemma 7.3.1 The set Q defined above satifies 

g = {NG Mr^{k)\N*J+ JN = 0 „}. 

Proof 

If N*.J + JN = On, let us set M{f) = exptiV. Then M(0) = /„ and 

^MitYJMlf) = M*{t)(N*J + JN)M(t) = 0„, 
dt 

so that M{t)*JM{t) = J. We thus have N G g. Conversely,, if M{t) := 
exptA^ G G for every t, then the derivative at t = 0 of M*{t)JM{t) = J 
gives N* J + JN = On. 

■ 

Lemma 7.3.2 The map exp : g nH„ ^ GnHPD„ is a homeomorphism. 
Proof 

We must show that exp : g n H„ ^ G n HPD„ is onto. Let M G 
G n HPD„ and let N be the Hermitian matrix such that exp IV = M. 
Let p G IR\X] be a polynomial with real entries such that for every A G 
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SpM U SpM~^, we have p{\) = log A. Such a polynomial exists, since the 
numbers A are real and positive. 

Let N = U*DU be a unitary diagonalization of N. Then M = expiV = 
U*{exp D)U and M~^ = exp(— A^) = U* exp{—D)U. Hence, p{M) = N 
and p{M~^) = —N. However, M G G implies MJ = JM~^, and therefore 
q{M)J = Jq{M~^) for every q £ M[X]. With q = p,we obtain N J = —JN. 

■ 

These two lemmas complete the proof of the theorem, since Q n H„ is an 
fR-vector space. The integer d mentionned in the theorem is its dimension. 

■ 

We wish to warn the reader that neither Q, nor H„ is a C-vector space. 
We shall see examples in the next section that show that Q n H„ can be 
naturally fR-isomorphic to a C-vector space, which is a source of confusion. 
One therefore must be cautious when computing d. 

The reader eager to learn more about the theory of classical groups is 
advised to have a look at the book of R. Mneimne and F. Testard [28] or 
the one by A. W. Knapp [24]. 



7.4 The Groups U(j9, g) 



Let us begin with the study of the maximal compact subgroup of U(p, q). 
If M G U{p,q) n U„, let us write M blockwise: 



M = 



A B \ 
C D )’ 



where A G Mp(C), etc. The following equations express that M belongs 
to U„: 



A*A + C*C = Ip, B*B + D*D = Ig, A*B + C*D = 0. 



Jpq- 



Similarly, writing that M £ U{p,q), 



A*A- C*C = Ip, D*D - B*B = I„, A*B-C*D = Q. 






Jpq- 



Combining these equations, we obtain first C*C = Op and B*B = Og. For 
every vector X £ CJ", we have IjCAlU = X*C*CX = 0; hence CX = 0. 
Finally, C = 0 and similarly B = 0. There remains A £ Up and B> £ Ug. 
The maximal compact subgroup of U(p,q) is thus isomorphic (not only 
homeomorphic) to Up x Ug. 

Furthermore, n H„ is the set of matrices 



N = 



A B \ 
B* D ) ’ 



where A G Hp, D G Hg, which satisfy NJ + JN = 0„; that is, A = Op, 
D = Og. Hence Q n H„ is isomorphic to Mpxg(C'). One therefore has 
d = 2pq. 
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Proposition 7.4.1 The unitary group \J{p,q) is homeomorphic to Up x 
V, X . In particular, U {p, q) is connected. 

There remains to show connectivity. It is a straightforward consequence of 
the following lemma. 

Lemma 7.4.1 The unitary group U„ is connected. 

Since GL„(C) is homeomorphic to U„ x HPD„ (via polar decomposition), 
hence to U„ x H„ (via the exponential), it is equivalent to the following 
statement. 

Lemma 7.4.2 The linear group GL„(C') is connected. 

Proof 

Let M e GL„(C) be given. Define A := G\ {(1 - X)-^\X G Sp(M)}. 
The arcwise-connected set A does not contain the origin, nor the point 
z = 1, since 0 ^ Sp(M). There thus exists a path 7 joining 0 to 1 in A: 
7 G C([0, 1]; t1), 7(0) = 0 and 7(1) = 1. Let us define M{t) := j{t)M + (1 — 
■j{t))In. By construction, M{t) is invertible for every t, and M(0) = 
M(l) = M. The connected component of /„ is thus all of GL„(G). 



7.5 The Orthogonal Groups 0(j9, g) 

The analysis of the maximal compact subgroup and of ^ n H„ for the group 
0{p,q) is identical to that in the previous paragraph. On the one hand, 
0{p,q) n On is isomorphic to Op x O^. On the other hand, Q n H„ is 
isomorphic to Mpxq(iR), which is of dimension d = pq. 

Proposition 7.5.1 Letn > 1. The group 0{p,q) is homeomorphic to OpX 
X The number of its connected components is two if p or q is zero, 
four otherwise. 

Proof 

We must show that 0„ has two connected components. However, 0„ is 
the disjoint union of SO„ (matrices of determinant +1) and of (matri- 
ces of determinant —1). Since 0~ = M ■ SO„ for any matrix M G 0~ (for 
example a hyperplane symmetry), there remains to show that the special 
orthogonal group SO„ is connected, in fact arcwise connected. We use the 
following property: 
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Lemma 7.5.1 Given M S 0„, there exists Q G 0„ such that the matrix 
Q~^MQ has the form 



/ (•) 0 ••• 0 \ 

0 : 

: 0 

V 0 ••• 0 (•) 



where the diagonal blocks are of size 1x1 or 2x2 and are orthogonal, those 
of size 2x2 being rotations matrices: 



f cos 9 sin 9 
— sin 0 cos 9 



(7.8) 



Let us apply Lemma 7.5.1 to M G SO„. The determinant of M, which 
is the product of the determinants of the diagonal blocks, equals (—1)™, 
m being the multiplicity of the eigenvalue —1. Since det M = 1, to is even, 
and we can gather the diagonal — I’s pairwise in order to form matrices of 
the form (7.8), with 9 = tt. Finally, there exists Q G 0„ such that 



/ i?i 0 
0 



M = Q^ 



Rr 



1 






0 

0 1 / 



Q, 



V 0 

where each diagonal block Rj is a matrix of planar rotation: 



Rj — 



cos 9j sin 9j 
— sin 9j cos 9j 



Let us now define a matrix M(t) as above, in which we replace the angles 
9j by t9j. We thus obtain a path in SO„, from M(0) = In to M(l) = M. 
The connected component of /„ is thus the whole of SO„. 



We now prove Lemma 7.5.1: As an orthogonal matrix, M is normal. 
From Theorem 3.3.1, it decomposes into a matrix of the form (7.7), the 
1x1 diagonal blocks being the real eigenvalues. These eigenvalues are ±1, 
since Q~^MQ is orthogonal. The diagonal blocks 2x2 are direct similitude 
matrices. However, they are isometries, since Q~^MQ is orthogonal. Hence 
they are rotation matrices. 
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1.5.1 Notable Subgroups ofO{p,q) 

We assume here that p,q > 1, so that 0{p,q) has four connected 
components. We first describe them. 

Let us recall that if M G 0(p, q) reads blockwise 



M = 



A B \ 
CD)' 



where A G Mp(iR), etc. Then A^ A = C^C + Ip is larger than Ip as a 
symmetric matrix, so that det A cannot vanish. Similarly, D = B~\~Iq 
shows that det D does not vanish. The continuous map M (det A, det D) 
thus sends 0{p,q) to M* x M* (in fact, to {M \ (—1,1))^). Since the sign 
map from IR* to {— ,+} is continuous, we may thus define a continuous 
function 



0{p,q) ^ {-,+r^{Z/2Z)\ 

M (sgndet A,sgndetD). 



The diagonal matrices whose diagonal entries are ±1 belong to 0{p, q). It 
follows that a is onto. Since a is continuous, the preimage Ga of an element 
a of { — , +}^ is the union of some connected components of 0(p, q); let n{a) 
be the number of these components. Then n{a) > 1 (ct being onto), and 
ri{a) equals 4, the number of connected components of 0(p,q). Since 
there are four terms in this sum, we obtain n(a) = 1 for every a. Finally, 
the connected components of 0{p,q) are the Go's, where a G {— ,+}^- 
The left multiplication by an element M of 0{p,q) is continuous, bijec- 
tive, whose inverse (another multiplication) is continuous. It thus induces a 
permutation of the set ttq of connected components of 0(p, q). Since cr in- 
duces a bijection between ttq and { — ,+}^, there exists thus a permutation 
qM of { — ,+}^ such that a (MM') = qM{cr{M')). Similarly, the multiplica- 
tion at right by M' is an homeomorphism, allowing to define a permutation 
Pm' of{— ,-I-}^ such that a(MM') = pM>{a{M)). The equality 

PM'{a{M)) = qM(<j{M')) 



shows that pm and qM actually depend only on a{M). In other words, 
a(MM') depends only on a{M) and a(M'). A direct evaluation in the 
special case of matrices in 0{p, g)nO„(IR) leads to the following conclusion. 



Proposition 7.5.2 {p,q> 1) The connected components of G = 0{p,q) 
are the sets Ga ■= o'“^(a), defined by a\ det A > 0 and 02 det D > 0, when 
a matrix M is written blockwise as above. The map a : 0{p,q) {— >+}^ 

is a surjective group homomorphism; that is, a(MM') = a{M)a{M') . In 
particular: 
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Remark: cr admits a right inverse, namely 

a 1 -^ := diag(ail, 1, . . . ,1, o; 2 l). 

The group 0{p,q) appears, therefore, as the semidirect product of G++ 
with {Z /2Z)“^ . 

We deduce immediately from the proposition that 0{p,q) possesses five 
open and closed normal subgroups, the preimages of the five subgroups of 
{Z/2Z)^: 

• 0{p,q) itself; 

• G++, which we also denote by Go (see Exercise 21), the connected 
component of the unit element 

• G++ U Gq,, for the three other choices of an element a. 

One of these groups, namely G++ U G is equal to the kernel SO(p, q) 

of the homomorphism M det M . In fact, this kernel is open and closed, 

thus is the union of connected components of 0(p,q). However the sign of 
detM for M G Ga is that of ai« 2 , which can be seen directly from the 
case of diagonal matrices M“. 

7.5.2 The Lorentz Group 0 ( 1 , 3 ) 

If p = 1 and <7 = 3, the group 0(1,3) is isomorphic to the orthogonal 
group of the Lorentz quadratic form dT — dx\ — — dx\, which defines 

the space-time distance in special relativity^ Each element M of 0(1,3) 
corresponds to the transformation 




which we still denote by M, by abuse of notation. This transformation 
preserve the light cone of equation — x\ — x^ — x^ = 0. Since it is 
a homeomorphism of it permutes the connected components of the 
complement C of that cone. There are three such components (see Figure 
7.1): 

• the convex set G+ := {{t,x) \ ||a;|| < t}; 

• the convex set G_ := {{t,x) \ ||x|| < — t}; 

• the “ring” A := {{t,x) \ |t| < ||x||}. 

Clearly, G+ and G_ are homeomorphic. For example, they are so via the 
time reversal 1 —t. However, they are not homeomorphic to A, because 
the latter is homeomorphic to x Bd? (here, S"^ denotes the unit sphere) , 



^We have selected a system of units in which the speed of light equals one. 
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Figure 7.1. The Lorentz cone. 



which is not contractible, while a convex set is always contractible. Since 
M is a homeomorphism, one deduces that necessarily, MA = A, while 
MC+ = C±, MC- = C^. 

The transformations that preserve (7+, and therefore every connected 
component of C, form the orthochronous Lorentz group. Its elements are 
those that send the vector eo := (1, 0, 0, 0)^ to C+; that is, those for which 
the first component of Mbq is positive. Since this component is A (here it 
is nothing but a scalar), this group must be G++ U G+_. 



7.6 The Symplectic Group Sp„ 

Let us study first of all the maximal compact subgroup Sp„ n 02«. If 



with blocks of size n x n, then M € Sp„ means that 

A^C = C'^A, X^D - C'^B = B'^D = D'^B, 



while M € 02n yields 

A'^A + C'^C = In, B^B + D^D = In, + G'^G = 0„. 
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But since G Sp„, we also have 

AB'^ = BA^, AD^ - BC'^ = In, CD^ = DC^ . 
Let us combine these equations: 



B = B{A'^A+C'^C) = AB^ A+{AD'^ -In)C = A{B'^ A+D^C)-C = -C. 

Similarly, 

D = D{A^A+C'^C) = {In + CB^)A+CD^C = A+C{B^A+D^C) = A. 

Hence 

^ -B a) ' 

The remaining conditions are 

A^ A + B^B = In, A^B = B'^A. 



This amounts to saying that A + iB is unitary. One immediately checks 
that the map M H + zH is an isomorphism from Sp„ onto U„. 

Finally, if 



N = 



A B \ 
B^ D ) 



is symmetric and N J + JN = 02n, we have, in fact. 



N = 



A 

B 



B 

-A 



where A and B are symmetric. Hence Q H Sym 2 „ is homeomorphic to 
Sym„ X Sym„, that is, to 



Proposition 7.6.1 The symplectic group Sp„ is homeomorphic to U„ x 

jp^n{n+l) ^ 



Corollary 7.6.1 In particular, every symplectic matrix has determinant 

+ 1 . 

Indeed, Proposition 7.6.1 shows that Sp„ is connected. Since the de- 
terminant is continuous, with values in { — 1, 1}, it is constant, equal to 
-kl. 



7.7 Singular Value Decomposition 

As we shall see in Exercise 8 (see also Exercise 12 in Chapter 4), the 
eigenvalues of the matrix H in the polar decomposition of a given matrix 
M are of some importance. They are called the singular values of M. Since 
these are the square roots of the eigenvalues of M*M, one may even speak 
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of the singular values of an arbitrary matrix, not necessarily invertible. 
Recalling that (see Exercise 17 in Chapter 2) when M is n x m, M*M and 
MM* have the same nonzero eigenvalues, counting with multiplicities, one 
may even speak of the singular values of a rectangular matrix, up to an 
ambiguity concerning the multiplicity of the eigenvalue 0. 

The main result of the section is the following. 



Theorem 7.7.1 Let M e M„xm(C!') be given. Then there exist two 
unitary matrices U G U„, V G Um and a quasi-diagonal matrix 



/ si 



D = 



■ J 



with si, . . . , Sr G (0, +oo), such that M = UDV . The numbers si, . . . , Sr 
are uniquely defined up to permutation; they are the nonzero singular values 
of M. In particular, r is the rank of M. 

If M G M„xm(.K), then one may choose U,V to be real orthogonal. 



Remark: The factorization given in the theorem is far from being unique, 
even for invertible square matrices. In fact, the number of real degrees of 
freedom in that factorization is n^+m^+min(n, m), which is always greater 
than the dimension 2nm of M„xm(C!') as an IR-vector space. 

Proof 

Since MM* is positive semidefinite, we may write its eigenvalues as 
sf,... ,Sr,0,..., where the sfis, the singular values of M, are positive 
real numbers. The spectrum of M*M has the same form, except for the 
multiplicity of 0. Indeed, the multiplicities of 0 as an eigenvalue of MM* 
and MM* , respectively, differ by n — m, while the multiplicities of other 
eigenvalues are the same for both matrices. We set S = diag(si, ... , Sr). 

Since M and MM* have the same rank, and since R{MM*) C R{M), we 
have R{MM*) = R{M). Since MM* is Hermitian, its kernel is R{M)-^, 
where orthogonality is relative to the canonical scalar product; with the 
duality formula, we conclude that kerMM* = kerM*. Now we are in 
position to state that 

C” = R{MM*) ©-^ ker M* . 

Therefore, there exists an orthonormal basis {ui, . . . ,u„} of C" consist- 
ing of eigenvectors of MM*, associated to the s|’s, followed by vectors of 
ker M* . Let us form the unitary matrix 

U = (ui, . . . ,u„). 

Written blockwise, we have U = {Ur, Uk), where 

MM*Ur = UrS^, M*Uk = 0. 
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Let now define Vr := M*UrS From above, we have 
V^Vr = S-^U*^MM*UrS~^ = Ir. 



This means that the columns vi, . . . , v^. of Vr constitute an orthonormal 
family. 

Noting that these column vectors belong to R{M*), that is, to (kerM)-*-, 
a subspace of codimension r, we see that {vi, . . . , v^} can be extended to 
an orthonormal basis {vi, . . . , v^} of C?™, where Vr+i, ■ ■ ■ belong to kerM. 
Let V =: {Vr, Vr) be the unitary matrix whose columns are Vi, . . . 

We now compute blockwise the product U*MV. From MVr = 0 and 
M*U'^ = 0, we get 



U*MV = 



U*f^MVR 0 \ 

0 0 J ' 



Finally, we obtain 

[7*MVr = U*j^MM*UrS~^ = U*j^UrS = S. 



7.8 Exercises 

1. Show that the square root map from HPD„ into itself is continuous. 

2. Let M e M„(fc) be given, with k = IR or C. Show that there ex- 
ists a polynomial P G k{X), of degree at most n — 1, such that 
P{M) = expM. However, show that this polynomial cannot be 
chosen independently of the matrix. 

Compute this polynomial when M is nilpotent. 

3. For t & IR, define Pascal’s matrix P{t) by Pij{t) = 0 if i < j (the 
matrix is lower triangular) and 

y_\) 

otherwise. Let us emphasize that for just this once in this book, P 
is an infinite matrix, meaning that its indices range over the infinite 
set IN* . Compute P'{t) and deduce that there exists a matrix L such 
that P{t) = exp(M). Compute L explicitly. 

4. Let I be an interval of JR and t P{t) be a map of class with 
values in M„(iR) such that for each t, P{t) is a projector: P{t)'^ = 
P{t). 

(a) Show that the rank of P{t) is constant. 

(b) Show that P{t)P' {t)P{t) = 0„. 
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(c) Let us define Q{t) := [P'{t),P{t)]. Show that P'{t) = 

mt),p(t)]. 

(d) Let to G be given. Show that the differential equation U' = QU 
possesses a unique solution in I such that [/(to) = In- Show that 

P(t) = [/(t)P(to)[/(t)-i. 

5. Show that the set of projectors of given rank p is a connected subset 
in M„(C'). 

6. (a) Let A G HPD„ and B G H„ be given. Show that AB is di- 

agonalizable with real eigenvalues (though it is not necessarily 
Hermitian). Show also that the sum of the multiplicities of the 
positive eigenvalues (respectively zero, respectively negative) is 
the same for AB as for B. 

(b) Let A, B, C be three Hermitian matrices such that ABC G H„. 
Show that if three of the matrices A, B,C, ABC are positive 
definite, then the fourth is positive definite too. 

7. Let M G GL„(C) be given and M = HQ be its polar decomposition. 
Show that M is normal if and only ii HQ = QH . 

8. The deformation of an elastic body is represented at each point by a 
square matrix F G GLj^(fR) (the sign + expresses that det T’ > 0). 
More generally, F G GL(((iR) in other space dimensions. The density 
of elastic energy is given by a function F W{F) G . 

(a) The principle of frame indifference says that W{QF) = W{F) 
for every F G GLj((iR) and every rotation Q. Show that there 
exists a map w : SPD„ ^ such that W{F) = w{H), where 
F = QH is the polar decomposition. 

(b) When the body is isotropic, we also have W{FQ) = W{F), for 
every F G GL^ (FI) and every rotation Q. Show that there exists 
a map (j) : FC Fi^ such that W{F) = 4>{hi, . . . , hn), where 
the hj are the entries of the characteristic polynomial of H. In 
other words, W{F) depends only on the singular values of F. 

9. We use Schur’s norm ||H|| = (TrH*H)^/^. 

(a) If H G M„(C'), show that there exists Q G U„ such that ||7l — 
Qll < ||t1 — U\\ for every U G U„. We shall define S := Q~^A. 
We therefore have US' — /„|| < ||S — U\\ for every U G U„. 

(b) Let H G H„ be a Hermitian matrix. Show that exp{itH) G U„ 
for every t G Ft. Compute the derivative at t = 0 of 

II S — exp(itiL)|p 

and deduce that S G H„. 

(c) Let I? be a diagonal matrix, unitarily similar to S. Show that 
||[/^ — -/nil < \\DU — In\\ for every [/ G U„. By selecting a suitable 
U, deduce that S > 0„. 
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(d) If A e GL„(C), show that QS is the polar decomposition of A. 

(e) Deduce that if iJ e HPD„ and if C/ S U„, U ^ In, then 

(f) Finally, show that if H £ H„, -ff > 0„ and U G U„, then 
\\H-In\\ < \\H-U\\. 



10. Let A £ M„(C') and h £ G. Show that /„ — hA is invertible as soon 
as \h\ < l/p{A). One then denotes its inverse by R{h]A). 

(a) Let r £ (0, 1 / p{A)). Show that there exists a cq > 0 such that 
for every h £ C with \h\ < r, we have 

\\R{h;A)-e>^^\\<co\h\^. 

(b) Verify the formula 



and deduce the bound 



\\R{h; A)™ - 

when \h\ < r and m £ IN. 

(c) Show that for every t £ C, 

lim R(tlm-AT = e^^. 

m — »-+oo 

11. (a) Let J(a; r) be a Jordan block of size r, with a £ C* . Let h £ Ghe 

such that a = e^. Show that there exists a nilpotent N £ Mr(G) 
such that J(a; r) = exp(6/r + N). 

(b) Show that exp : M„(C) ^ GL„(C) is onto, but that it is not 
one-to-one. Deduce that X i-^- is onto GL„(C). Verify that 
it is not onto M„(C). 



12. (a) Show that the matrix 



J2 



-1 

0 



1 

-1 



is not the square of any matrix of M2(iR). 

(b) Show, however, that the matrix J4 := diag(J2, J2) is the square 
of a matrix of M4(1R). 

Show also that the matrix 



J3 



J2 I2 A 

02 J2 ) 



is not the square of a matrix of M4(1R). 

(c) Show that J2 is not the exponential of any matrix of M2(1R). 
Compare with the previous exercise. 
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(d) Show that J4 is the exponential of a matrix of M4(iR), but that 
J3 is not. 



13. Let An{G) be the set of skew-Hermitian matrices of size n. Show 
that exp : A„(C) ^ U„ is onto. Hint: If U is unitary, diagonalize it. 

14. (a) Let 6 G M he given. Compute exp B, where 




(b) Let An{M) be the set of real skew-symmetric matrices of size n. 
Show that exp : A„(iR) — > SO„ is onto. Hint: Use the reduction 
of direct orthogonal matrices. 



15. Let (j) : M„(1R) — > IR be a nonnull map satisfying (f>{AB) = (f>{A)<p{B) 
for every A,Bg M„(1R). If a G M, we set (5(a) = |</>(a/„)|^/". We 
have seen, in Exercise 16 of Chapter 3, that |^(M)| = <5(detM) for 
every M G M„(1R). 

(a) Show that on the range of M and on that of M exp M, 

(j) = 6 o det. 

(b) Deduce that (j) = 5odet on SO„ (use Exercise 14) and on SPD„. 

(c) Show that either cj) = S o det or (j) = (sgn(det))<5 o det. 



16. Let A be a A-Banach algebra {K = M or G) with a unit denoted by 
e. If X G A, define a;° := e. 



(a) Given x G A, show that the series 



E 

m^lN 



m 



converges normally, hence converges in A. Its sum is denoted by 
exp X. 

(b) If x,y G A, [x, y]= xy — yx is called the “commutator” of x and 
y. Show that [x,y] =0 implies 



exp(x + y) = (exp a;)(exp y), [x, exp y] = 0. 



(c) Show that the map 1 expte is differentiable on IR, with 

d 

— exp tx = X exp tx = (exp tx)x. 

(d) Let x,y G Ahe given. Assume that [x, y] commutes with x and 

y- 

i. Show that (exp — tx)xy(exp tx) = xy + t[y, x]x. 

ii. Deduce that [exp —tx, y] = t[y, x] exp —tx. 
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iii. Compute the derivative of t (exp — tj/)(exp —tx) exp t{x + 

y). Finally, prove the Campbell-Hausdorff formula 

exp(x + y) = (exp x) (exp y) ^exp i [y, x] 

(e) In A = M3 (JR), construct an example that satisfies the above 
hypothesis {[x,y] commutes with x and y), where [x,y] is 
nonzero. 

17. Show that the map 

H ^ f{H) := (i/„ + iJ)(z/„ - H)-^ 

induces a homeomorphism from H„ onto the set of matrices of U„ 
whose spectrum does not contain —1. Find an equivalent of f{tH) — 
exp{—2itH) as t ^ 0. 

18. Let G be a group satisfying the hypotheses of Proposition 7.3.2. 

(a) Show that ^ is a Lie algebra, meaning that it is stable under the 
bilinear map {A, B) 1— > [A, B] := AB — BA. 

(b) Show that for t — > 0+, 

exp(tA) exp(tB) exp(— <A) exp(— = /„ + t'^[A, B] + 

Deduce another proof of the stability of Q by [•,•]. 

(c) Show that the map M \A, M] is a derivation, meaning that 
the Jacobi identity 

[A [B,C]] = [[A,B],C] + [B,[A,C]] 

holds. 

19. In the case p = 1, q> 1, show that G++ U is the set of matrices 

M G 0{p,q) such that the image under M of the “time” vector 
(1,0 ,... ,0)^ belongs to the convex cone whose equation is 

Xi > \Jx\A 

20. Assume that p,q> I and consider the group 0{p,q). Define Gq := 
G++. Since — G 0{p,q), we denote by (/r,/3) the indices for which 
—In G 

If iJ G GL„(iR), denote by an the conjugation M 1— > 

(a) Let H G G he given. Show that (Th (or rather its restriction to 
Go) is an automorphism of Gq. 

(b) Let H G M„(iR) be such that HM = MH for every M G Gq. 
Show that HN = NH for every N G Q. Deduce that H is a 
homothety. 

(c) Let H G G. Show that there exists K G Go such that an = (Xk 
if and only if iJ G Go U 
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21. A topological group is a group G endowed with a topology for which 
the maps (g,h) e- > gh and g g~^ are continuous. Show that in a 
topological group, the connected component of the unit element is 
a normal subgroup. Show also that the open subgroups are closed. 
Give an example of a closed subgroup that is not open. 

22. One identifies with C" by the map 

1-^ X + ly. 

y ) 

Therefore, every matrix M G M 2 „(iR) defines an JR-linear map M 
form C" into itself. 

(a) Let 

M = ( ^ ^ ) e M2„(1R) 

be given. Under what condition on the blocks A, B, C, D is the 
map M C'-linear? 

(b) Show that M M is an isomorphism from Sp„ n 02« onto U„. 




8 

Matrix Factorizations 



The direct solution (by Cramer’s method) of a linear system Mx = b, 
where M G GL„(fc) {h G A:”) is computationally expensive, especially if 
one wishes to solve the system many times with various values of b. In the 
next chapter we shall study iterative methods for the case k = M or C. 
Here we concentrate on a simple idea: To decompose M as a product PQ 
in such a way that the resolution of the intermediate systems Py = b and 
Qx = y is “cheap.” In general, at least one of the matrices is triangular. 
For example, if P is lower triangular (j>ij = 0 if t < j), then its diagonal 
entries pu are nonzero, and one may solve the system Py = b step by step: 

Pii’ 

bj -Piiyi pi^i-ipi-i 

Pii 

bn PnlVl ‘ ‘ ‘ Pn,n—iyn—l 
Pnn 



yi = 



y^ = 



yn = 



The computation of yi needs 2i—l operations and the final result is obtained 
in operations. This is not expensive if one notes that computing the 
product X = M~^b (assuming that M~^ is computed once and for all, an 
expensive task) needs 2n^ — n operations. 
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Another example of easily invertible matrices is the orthogonal matrices: 
If Q e On (or Q G U„), then Qx = y amounts to x = (or x = Q*y), 
which is computed in 0{n^) operations. 

The techniques described below are often called direct solving methods. 



8.1 The LU Factorization 

Definition 8.1.1 Let M G GL„(fc), where k is a field. We say that M 
admits an LU factorization if there exist in GL„(fc) two matrices L (lower 
triangular with 1 ’s on the diagonal) and U (upper triangular) such that 
M = LU. 

Remarks: 

• The diagonal entries of U are not equal to 1 in general. The LU 
factorization is thus asymmetric with respect to L and U. 

• The letters L and U recall the shape of the matrices: L for lower and 
U for upper. 

• If there exists an LU factorization (which is unique, as we shall see 
below), then it can be computed by induction on the size of the 
matrix. The algorithm is provided in the proof of the next theorem. 
Indeed, if denotes the matrix extracted from N by keeping only 
the first p rows and columns, we have easily 

M(pI = l(p^Up\ 

where the matrices L^p^ and U^p^ have the required properties. 

Definition 8.1.2 T/ie leading principal minors of M are the determinants 
of the matrices M^p\ for I < p < n. 

Theorem 8.1.1 The matrix M G GL„(/c) admits an LU factorization if 
and only if its leading principal minors are nonzero. When this condition 
is fulfilled, the LU factorization is unique. 

Proof 

Let us begin with uniqueness: If LU = L'U' , then {L')~^L = U'U~^, 
which reads L" = U" , where L” and U” are triangular of opposite types, 
the diagonal entries of L" being I’s. We deduce L" = U" = Ln', that is, 
L' = L,U' = U. 

We next assume that M admits an LU factorization. Then det M^p'> = 
det det Up^ ~ Y[i<j<p''^jj^ which is nonzero because U is invertible. 

We prove the converse Ithe existence of an LU factorization) by induction 
on the size of the matrices. It is clear if n = 1. Otherwise, let us assume 
that the statement is true up to the order n — 1 and let M G GL„(fc) be 
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given, with nonzero leading principal minors. We look for L and U in the 
blockwise form 



L = 



L' 




U = 



U' Y \ 

0 u )' 



with L', U' e Mn-i(k), etc. We likewise obtain the description 



M = 



M' R \ 
S’^ m J ' 



Multiplying blockwise, we obtain the equations 

L'U' = M', L'Y = R, {U')'^X = S, u = m-X'^Y. 



By assumption, the leading principal minors of M' are nonzero. The induc- 
tion hypothesis guarantees the existence of the factorization M' = L'U' . 
Then Y and X are the unique solutions of (triangular) Cramer systems. 
Finally, u is explicitly given. 



Let us now compute the number of operations needed in the computation 
of L and U. We pass from a factorization in GL„_i(A:) to a factorization in 
GL„(fc) by means of the computations oi X ((n — l)(n — 2) operations), Y 
((n— 1)^ operations) and u (2(n— 1) operations), for atotal of (n— l)(2n— 1) 
operations. Finally, the computation ex nihilo of an LU factorization costs 
P{n) operations, where P is a polynomial of degree three, with P{X) = 
2X^/3+ ■■■. 

Proposition 8.1.1 The LU factorization is computable in |n^ -I- 0{n^) 
operations. 

One says that the complexity of the LU factorization is |n^. 

Remark: When all leading principal minors but the last (detM) are 
nonzero, the proof above furnishes a factorization M = LU, in which U is 
not invertible; that is, u„n = 0. 



8.1.1 Block Factorization 

One can likewise perform a blockwise LU factorization, li n = pi + ■ ■ ■ + Pr 
with pj > 1, the matrices L and U will be block-triangular. The diagonal 
blocks are square, of respective sizes pi, . . . ,Pr. Those of L are of the form 
Ip., while those of U are invertible. A necessary and sufficient condition 
for such a factorization to exist is that the leading principal minors of M, 
of orders pi + • • • + Pj (j < r), be nonzero. As above, it is not necessary 
that the last minor det M be nonzero. Such a factorization is useful for the 
resolution of the linear system MX = & if the diagonal blocks of U are 
easily inverted, for instance if their sizes are small enough (say pj « yFi) . 

An interesting application of block factorization is the computation of 
the determinant by the Schur complement formula: 
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Proposition 8.1.2 Let M € M„(fc) read blockwise 

( A B \ 

^ ~\C d)' 

where the diagonal blocks are square and A is invertible. Then 
det M = det ^det(H — CA~^B). 

Of course, this formula generalizes det M = ad — be, which is valid only for 
2x2 matrices. The matrix D — CA~^B is called the Schur complement of 
A in M. 

Proof 

Since A is invertible, M admits a blockwise LU factorization, with the 
same subdivision. We easily compute 

L-( ^ M U-(^ ^ 

CA-^ I )' Vo D- CA-^B 

Then det M = det L det U furnishes the expected formula. 



Corollary 8.1.1 Let M G GLn(k), with n = 2m, read blockwise 

A,B,C,DGGL^{k). 

Then 

1 _( {A- BD-^C)-^ {C - DB-^A)-^ \ 

\ (B - AC-^D)-'^ (D-CA-^B)-^ J - 

Proof 

We can verify the formula by multiplying by M . The only point to show is 
that the inverses are meaningful, that is, that A—BD~^C, . . . are invertible. 
Because of the symmetry of the formulas, it is enough to check it for a single 
term, namely D — CA~^B. However, det(Zl — CA~^B) = detM/detH, 
which is nonzero by assumption. 



We might add that as soon as M G GL„(fc) and A G GLp(fc) (even if 
p yf n/2), then 



= 



(D-CA-^B)-^ 
because M admits the blockwise LU factorization and 



M~^ = U-'^L-^ = 



-Ir-l 



0 (D-CA-^B)-^ 



I 0 
• I 
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8.1.2 Complexity of Matrix Inversion 

We can now show that the complexity of inverting a matrix is not higher 
than that of matrix multiplication, at equivalent sizes. We assume here that 
k = IR ov C. 

Notation 8.1.1 We denote by Jn the number of operations in k used in 
the inversion of an n x n matrix, and by Pn the number of operations ( in 
k ) used in the product of two n x n matrices. 

Of course, the number Jn must be understood for generic matrices, that 
is, for matrices within a dense open subset of M„(fc). More important, 
Jn , Pn also depend on the algorithm chosen for inversion or for multipli- 
cation. In the sequel we wish to adapt the inversion algorithm to the one 
used for multiplication. 

Let us examine first of all the matrices whose size n has the form 2^. 
We decompose the matrices M G GL„(A:) blockwise, with blocks of size 
n/2 X n/2. The condition A G GL„/ 2 (fc) defines a dense open set, since 
M I— > det A is a nonzero polynomial. Suppose that we are given an inver- 
sion algorithm for generic matrices in GL„/ 2 (fc) in jk-i operations. Then 
blockwise LU factorization and the formula M~^ = U~^L~^ furnish an 
inversion algorithm for generic matrices in GL„(fc). We can then bound 
jk by means of jk-i and the number iTk-i = T’ 2 '=-i of operations used in 
the computation of the product of two matrices of size 2^“^ x 2*“^. We 
shall denote also by cr^ = 2^* the number of operations involved in the 
computation of the sum of matrices in 'M. 2 k (k) . 

To compute M~^, we first compute A~^, then CA~^, which gives us 
L~^ in jk-i + TTk-i operations. Then we compute {D — CA~^B)~^ (this 
amounts to ak-i + T^k-i + jk-i operations) and A~^B{D — CA~^B)~^ 
(cost: 27Tfc_i), which furnishes U~^. The computation of U~^L~^ is done 
at the cost ak-i + 27Tfc_i. Finally, 

jk < 2jfc_l + 2Uk-l + 67Tfc_l. 

In other words, 

2-^jk - 2^-^jk-i < 2^-^ + 3 • 2^-'=7rfc_i. (8.1) 

The complexity of the product in M„(fc) obeys the inequalities 
M < Pn < n^(2n — 1). 

The first inequality is due to the number of data (2n^) and the fact that 
each operation involves only two of them. The second is given by the naive 
algorithm that consists in computing scalar products. 

Lemma 8.1.1 If Pn < Can“ (with 2 < a < 3), then ji < CcTTi, where 
Cn, = l-k3c„/(2“-i - 1). 

It is enough to sum (8.1) from k = 1 to I and use the inequality 1 -I- q -I- 
• • • + f /{q — 1) for q > 1. 
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When n is not a power of 2, we obtain M~^ by computing the inverse of a 
block-diagonal matrix diag(M, I), whose size N satisfies n < N = 2’’ < 2n. 
We obtain Jn < ji < Cq-tt;. Finally, we have the following result. 



Proposition 8.1.3 If the complexity Pn of the product in M„(C) is 
bounded by Can°' , then the complexity Jn of inversion in GL„(C) is 
bounded by daU°‘, where 



da — 1 + 



3co 



2“-i - 1 



That can be summarized as follows: 



Those who know how to multiply know also how to invert. 



8.1.3 Complexity of the Matrix Product 



The ideas that follow apply to the product of rectangular matrices, but for 
the sake of simplicity, we present only the case of square matrices. 

As we have seen above, the complexity of matrix multiplication in 
Mn{k) is O(n^). However, better algorithms will allow us to improve the 
exponent 3. The simplest and oldest one is Strassen’s algorithm, which 
uses a recursion. We note first that there exists a way of computing the 
product of two 2x2 matrices by means of 7 multiplications and 18 addi- 
tions. Two features of Strassen’s formula are essential. First, the number of 
multiplications that it involves is stricly less than that (eight) of the naive 
algorithm. The second is that the method is valid when the matrices have 
entries in a noncommutative ring, and so it can be employed for two matri- 
ces M,N G M„(fc), considered as elements of M 2 (A), with A := M„/ 2 (^)- 
This trick yields 

Pn < lPn/2 + 9n^/2. 

For n = 2* , we then have 



7-i 7 I-V < ^ 

7 TTl-l TTl-i < - - 



2 V7 



which, after summation from k = \ to I, gives 



7-Vi < ^ 



9 1 



2 1-4/7’ 



because of | < 1. Finally, 



21.,/ 



When n is not a power of two, one chooses I such that n < 2* < 2n and we 
obtain the following result. 
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Proposition 8.1.4 The complexity of the multiplication ofnxn matrices 
is 0{n°'), with a = log 7/ log 2 = 2.807. . . More precisely, 



Pn < 
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log 7 
log 2 ^ 



The exponent a can be improved, at the cost of greater complication and a 
larger constant Cq,. The best exponent known in 1997, due to Coppersmith 
and Winograd [11], is a = 2.376... A rather complete analysis can be 
found in the book by P. Biirgisser, M. Clausen, and M. A. Shokrollahi [7]. 

Here is Strassen’s formula [33]. Let M,N G M 2 (A), with 



M = 



a b 
c d 



N = 



X y 
z t 



One first forms the expressions xi = {a + d){x + t), X2 = (c + d)x, X3 = 
a{y — f), Xi = d{z — x), xq = {a + h)t, xq = (c — a){x + y), x-j = {b — d){z + f). 
Then one computes the product 

MN = + ^4 - + 2^7 X 3 + X 5 \ 

X2+ Xi Xi — X2 + X^ + Xq J ' 



8.2 Choleski Factorization 

In this section k = IR, and we consider symmetric positive definite matrices. 

Theorem 8.2.1 Let M G SPD„. Then there exists a unique lower trian- 
gular matrix L G M„(iR), with strictly positive diagonal entries, satisfying 
M = LL'^. 



Proof 

Let us begin with uniqueness. If Li and L 2 have the properties stated 
above, then /„ = LL^, for L = which still has the same form. In 

other words, L = L~^ , where both sides are triangular matrices, but of 
opposite types (lower and upper). The equality shows that L is actually 
diagonal, with Since its diagonal is positive, we obtain L = !„; 

that is, L 2 = Li. 

We shall give two constructions of L. 

First method. The matrix is positive definite (test the quadratic 
form induced by M on the linear subspace JR^ x {0}). The lead- 
ing principal minors of M are thus nonzero and there exists an LU 
factorization M = LqUq. Let D be the diagonal of Uq, which is in- 
vertible. Then Uq = DUi, where the diagonal entries of C/i equal 
1. By transposition, we have M = DqLq . From uniqueness of 
the LU factorization, we deduce Ui = Lq and M = LqDLq . Then 
L = \/1JLq satisfies the conditions of the theorem. Observe that 
D > 0 because D = PMP"^ , with P = L^^ . 
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Second method. We proceed by induction on n. The statement is clear 
if n = 1. Otherwise, we seek an L of the form 



knowing that 




0 

/ 




R 

m 



The matrix L' is obtained by Choleski factorization of M', which 
belongs to SPD„_i. Then X is obtained by solving L'X = R. Finally, 
I is a square root of m — ||-^|p. Since 0 < det M = {I det we see 
that m — ||X|p > 0; we thus choose I = i/m — ||Wp. This method 
again shows uniqueness. 



Remark: Choleski factorization extends to Hermitian positive definite ma- 
trices. In that case, L has complex entries, but its diagonal entries are still 
real and positive. 



8.3 The QR Factorization 

In this section k = JR or G, the real case being a particular case of the 
complex one. 

Proposition 8 . 3.1 Let M G GL„(C) be given. Then there exist a unitary 
matrix Q and an upper triangular matrix R, whose diagonal entries are real 
positive, such that M = QR. This factorization is unique. 

We observe that the condition on the numbers rjj is essential for unique- 
ness. In fact, if D is diagonal with \djj \ = 1 for every j, then Q' := QD is 
unitary, R' := DR is upper triangular, and M = Q' R' , which gives an in- 
finity of factorizations “QU.^^ Even in the real case, where Q is orthogonal, 
there are 2" “QU^’ factorizations. 

Proof 

We first prove uniqueness. If (Qi,i?i) and (Q2,R2) give two factoriza- 
tions, then Q = R, with Q := Qf^Qi and R := R2RR . Since Q is unitary, 
we deduce Q* = R~^, or Q = R~* . This shows (recall that the inverse of a 
triangular matrix is a triangular matrix of same type) that Q is simultane- 
ously upper and lower triangular, and is therefore diagonal. Additionally, 
its diagonal part is strictly positive. Then = Q*Q = gives Q = In. 
Finally, Q2 = Qi and consequently, R2 = Ri. 

The existence follows from that of Choleski factorization. If M e 
GL„(C), the matrix M*M is Hermitian positive definite, hence admits a 
Choleski factorization R*R, where R is upper triangular with real positive 
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diagonal entries. Defining Q := MR we have 

Q*Q = R-*M*MR-^ = R-*R*RR-^ = 
hence Q is unitary. Finally, M = QR by construction. 

■ 

The method used above is unsatisfactory from a practical point of view, 
because one can compute Q and R directly, at a lower cost, without com- 
puting M*M or its Choleski factorization. Moreover, the direct method, 
which we shall present now, is based on a theoretical observation: The QR 
factorization is nothing but the Gram-Schmidt orthonormalization proce- 
dure in C", endowed with the canonical scalar product (•,•). In fact, if 
V^, . . . , y" denote the column vectors of M, then giving M in GL„(C') 
amounts to giving a basis of C”. If . . . , F" denote the column vectors 
of Q, then {Y^ , . . . ,F"} is an orthonormal basis. Moreover, if M = QR, 
then 

k 

Denoting by Ek the linear subspace spanned by Y ^, . . . , Y^, of dimension 
k, one sees that V ^, ... ,V^ are in Ek] that is, {V ^ , . . . , V^} is a basis of 
Ek- Hence, the columns of Q are obtained by the Gram-Schmidt procedure, 
applied to the columns of M. 

The practical computation of Q and R is done by induction on k. If 
fc = I, then 

rn = ||Fi, = 

rii 

If fc > I, and if F^, . . . , are already known, one looks for F* and the 
entries rjk (j < fc). For j < fc, we have 

r,k = {V\Y^). 

Then 

rkk = \\Zk\\, F'= = — 

rkk 

where 

fc-i 

— -^TjkYE 

Let us examine the complexity of the procedure described above. To 
pass from the step fc — 1 to the step fc, one computes fc — 1 scalar products, 
then , its norm, and finally F^. This requires (4n— l)fc-|-3n operations. 
Summing from fc = 1 to n yields 2n^ -I- operations. This method is 

not optimal, as we shall see in Section 10.2.3. 
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The interest of this construction lies also in giving a more complete 
statement than Proposition 8.3.1: 

Theorem 8.3.1 Let M G M„(C) he a matrix of rank p. There exists 
Q G U„ and an upper triangular matrix R, with ru G IR^ for every I and 
fjk = 0 for j > p, such that M = QR. 

Remarks: The QR factorization of a singular matrix (i.e., a noninvertible 
one) is not unique. There exists, in fact, a QR factorization for rectangular 
matrices, in which i? is a “quasi-triangular” matrix. 



8.4 The Moore-Penrose Generalized Inverse 

The resolution of a general linear system Ax = b, where A may be singular 
and may even not be square, is a delicate question, whose treatment is 
made much simpler by the use of the Moore-Penrose generalized inverse. 
We begin with the fundamental theorem. 

Theorem 8.4.1 Let A G be given. There exists a unique matrix 

A^ G Mmxn{Cl), called the Moore-Penrose generalized inverse, satisfying 
the following four properties: 

1. AA^A = A; 

2 . TTT = T; 

3. AT G Hn; 

4-. A^ A G Hm- 

Finally, if A has real entries, then so has A^ . 

When A G GL„(C), coincides with the standard inverse A“^, since 
the latter obviously satisfies the four properties. More generaly, if A is 
onto, then property 1 shows that AA^ = i.e., A^ is a right inverse of A. 

Likewise, if A is one-to-one, then AM = i.e., A^ is a left inverse of A. 

Proof 

We first remark that if V is a generalized inverse of A, that is, it satisfies 
these four properties, and if C/ G U„, V G Um, then V*XU* is a generalized 
inverse of U AV. Therefore, existence and uniqueness need to be proved 
for only a single representative D of the equivalence class of A modulo 
unitary multiplications on the right and the left. From Theorem 7.7.1, we 
may choose D = diag(si, ... , Sr, 0, . . . ), where si, . . . ,Sr are the nonzero 
singular values of A. 

We are thus concerned only with quasi-diagonal matrices D. Let Ht be 
any generalized inverse of D, which we write blockwise as 




G H 
J K 
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We use the notation of Theorem 7.7.1. From property 1, we obtain S = 
SGS, where S := diag(si, . . . , Sr). Since S is nonsingular, we obtain G = 
S~^. Next, property 3 implies SH = 0, that is, 77 = 0. Likewise, property 
4 gives JS = 0, that is, J = 0. Finally, property 2 yields K = JSH = 0. 
We see, then, that T?! must equal (uniqueness) 




One easily checks that this matrix solves our problem (existence) . 

■ 

Some obvious properties are stated in the following proposition. We warn 
the reader that, contrary to what happens for the standard inverse, the 
generalized inverse of AB does not need to be equal to B^A^. 

Proposition 8.4.1 The following equalities hold for the generalized in- 
verse: 

(Ayl)t = lylt (A^O), (At)^ = A, (At)* = (aI*)^ 

7/ A G GL„(C'), then At = A-b 

Since (AAt)^ = AAt, the matrix AAt is a projector, which can therefore 
be described in terms of its range and kernel. Since AAt jg Hermitian, these 
subspaces are orthogonal to each other. Obviously, 7?(AAt) c 77(A). But 
since AAt A = reverse inclusion holds too. Finally, we have 

7?(AAt) = 77(A), 

and AAt jg orthogonal projector onto 77(A). Likewise, At A is an orthog- 
onal projector. Obviously, ker A C ker AtA, while the identity AAt A = 
implies the reverse inclusion, so that 

ker At A = ker A. 

Finally, AtA is the orthogonal projector onto (ker A)-*-. 

S.f.l Solutions of the General Linear System 

Given a matrix M G Mnxm{d) and a vector b G C", let us consider the 
linear system 

Mx = b. (8.2) 

In (8.2), the matrix M need not be square, even not of full rank. From 
property 1, a necessary condition for the solvability of (8.2) is MM% = 
b. Obviously, this is also sufficient, since it ensures that xq := M^b is 
a solution. Hence, the generalized inverse plays one of the roles of the 
standard inverse, namely to provide one solution of (8.2) when it is solvable. 
To catch every solution of that system, it remains to solve the homogeneous 
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problem My = 0. From the analysis done in the previous section, keriW is 
nothing but the range of Im — Therefore, we may state the following 

proposition: 

Proposition 8.4.2 The system (8.2) is solvable if and only ifb = MM^^b. 
When it is solvable, its general solution is x = Ml&+ {Im — M^M)z, where 
z ranges over C™. Finally, the special solution xq '■= M% is the one of 
least Hermitian norm. 

There remains to prove that xq has the smallest norm among the solu- 
tions. That comes from the Pythagorean theorem and from the fact that 
= (kerM)-L. 



8.5 Exercises 



1. Assume that there exists an algorithm for multiplying two N x N 
matrices with entries in a noncommutative ring by means of K 
multiplications and L additions. Show that the complexity of the 
multiplication in M„(fc) is 0(n“), with a = logK/logN. 

2. What is the complexity of Choleski factorization? 

3. Let M e SPD„ be also tridiagonal. What is the structure of L in 
the Choleski factorization? More generally, what is the structure of 
L when mij = 0 for \i — j\ > r? 

4. (continuation of exercise 3) 

For i < n, denote by 4>{i) the smallest index j such that rriij yf 0. 
In Choleski factorization, show that lij = 0 for every pair {i,j) such 
that j < 4>{i). 

5. In the QR factorization, show that the map M {Q, R) is continuous 
on GL„(C'). 



6. Let iJ be an n X n Hermitian matrix, that blockwise reads 



H = 



A B* \ 
B C )- 



Assume that A G HPDn-k (1 < fc < n — I). 

Find a matrix T of the form 

OJ 

such that THT* is block-diagonal. Deduce that if W G H^, then 



H- 



0 0 

0 w 
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is positive (semi)definite if and only if S —W is, where S is the Schur 
complement of A in iJ. 

7. (continuation of exercise 6) 

Fix the size k and denote by S{H) the Schur complement in the 
Hermitian matrix H when A G HPD„_fe. Using the previous exercise, 
show that: 

(a) S{H + H') — S{H) — S{H') is positive semidefinite. 

(b) li H — H' is positive semidefinite, then so is S{H) — S{H'). 

In other words, H S is “concave nondecreasing” on the convex 
set formed of those matrices of H„ such that A G HPD„_fc into the 
ordered set H^. The article [26] gives a review of the properties of 
the map H S{H). 

8. In Proposition 8.3.1, find an alternative proof of the uniqueness part, 
by inspection of the spectrum of the matrix Q := Q 2 ^Qi = R 2 Ri^ ■ 

9. Identify the generalized inverse of row matrices and column matrices. 

10. What is the generalized inverse of an orthogonal projector, that is, a 
Hermitian matrix P satisfying = PI Deduce that the description 
of AA^ and A"^ A as orthogonal projectors does not characterize 
uniquely. 

11. Given a matrix B G Mpxg(C') and a vector a G let us form the 
matrix A := (B,a) G Mpx(g+i)(C')- 

(a) Let us define d := BA, c:= a — Bd, and 

J c^ if cyf 0, 

\ (1 + ifc=0. 

Prove that 

(b) Deduce an algorithm {Grevillds algorithm in 0{pq^) operations 
for the computation of the generalized inverse oi a, p x q matrix. 
Hint: To get started with the algorithm, use Exercise 9. 
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In this chapter the field of scalars is K = IR or C. 

We have seen in the previous Chapter a few direct methods for solving 
a linear system Ax = b, when A G M„(itT) is invertible. For example, if A 
admits an LU factorization, the successive resolution oi Ly = b, Ux = y 
is called the Gauss method. When a leading principal minor of A van- 
ishes, a permutation of the columns allows us to return to the generic case. 
More generally, the Gauss method with pivoting consists in permuting the 
columns at each step of the factorization in such a way as to limit the 
magnitude of round-off errors and that of the conditioning number of the 
matrices L, U. 

The direct computation of the solution of a Cramer’s linear system Ax = 
6, by the Gauss method or by any other direct method, is rather costly, 
on the order of rA operations. It also presents several inconveniences. On 
the one hand, it does not exploit completely the sparse shape of many 
matrices A; in numerical analysis it happens frequently that an n x n 
matrix has only 0{n) nonzero entries, instead of 0{n^). On the other hand, 
the computation of an LU factorization is rather unstable, because the 
round-off errors produced by the computer are amplified at each step of 
the computation. 

For these reasons, one often uses an iterative method to compute an ap- 
proximate solution X™, instead of an exact solution. The iterative methods 
fully exploit the sparse structure of A. The number of operations is 0{am), 
where a is the number nonzero entries in A. The choice of m depends on 
the accuracy that one requires a priori. It is, however, modest, because 
the error llx™ — x|| from the exact solution x is of order constant x fc™. 
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where fc < 1 whenever the method converges. Typically, a dozen iterations 
give a rather good result, and then O{10a) <C 0{n^). another advantage of 
the iterative methods is that the round-off errors are damped during the 
computation, instead of being amplified. 

General principle: Choose a decomposition of A of the form M — N and 
rewrite the system, assuming that M is invertible: 

X = M~^{Nx + b). 

Then choosing a starting vector x^ G itT", which may be a rather coarse 
approximation of the solution, one constructs a sequence (x"^)meiv by 
induction: 

= (9.1) 

In practice, one does not compute M~^ explicitly but one solves the linear 
systems Mx^^^ = • • • . It is thus important that this resolution be cheap. 
This will be the case when M is triangular. In that case, the invertibility of 
M can be read from its diagonal, since it occurs precisely when the diagonal 
entries are nonzero. 



9.1 A Convergence Criterion 

Definition 9.1.1 Let us assume that A and M are invertible, A = M — N. 
We say that an iterative method is convergent if for every pair (x°,6) G 
Rn X Rn^ have 

lim X™ = A~^b. 

m — »-+oo 

Proposition 9.1.1 An iterative method is convergent if and only if 
p{M~^N) < 1. 

Proof 

If the method is convergent, then for 6 = 0, 

lim {M~^N)"^x° = 0, 

m — »-+oo 

for every x^ G K'^. In other words, 

lim = 0. 

m — »-+oo 

From Corollary 4.4.1, this implies p{M~^N) < 1. 

Conversely, if p{M~^N) < 1, then by Proposition 4.4.1, 

lim = 0, 

m — »-+oo 

and hence 

X™ - A~h = - A~^b) 0. 
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To be more precise, if || • || is a norm on itT", then 

||x™ - A~^\\ < \\{M-^N)"^\\ ||x° - A-^\\. 

From Householder’s theorem (Theorem 4.2.1), there exists for every e > 0 
a constant C{e) < oo such that 

||a;™ - A~^b\\ < C{e)\\x° - x\\{p{M-^N) + e)™. 

In most cases (in fact, when there exists an induced norm satisfying 
||M“^Ai|| = p{M~^N)), one can choose e = 0 in this inequality such that 

\\x”^ - A~^b\\ = 0{p{M-^N)'^). 

The choice of a vector such that — A~^b is an eigenvector associated 
to an eigenvalue of maximal modulus shows that this inequality cannot be 
improved in general. For this reason, we call the positive number 

r := - log p(M-^iV) 

the convergence ratio of the method. Given two convergent methods, we 
say that the first one converges faster than the second one if t\ > T 2 - For 
example, we say that it converges twice as fast if ri = 2 t 2. In fact, with 
an error of order p{M~^N)'^ = exp(— mr), we see that the faster method 
needs only half as many iterations to obtain the same accuracy. 



9.2 Basic Methods 



There are three basic iterative methods, of which the first has only a his- 
torical or theoretical interest. Each uses the decomposition of A into three 
parts, a diagonal one D, a lower triangular —E, and an upper triangular 
one —F: 



( d\ 



A = D- E - F = 



-F 



-E 



dn J 



In all cases, one assumes that D is invertible: The diagonal entries of A are 
nonzero. 



Jacobi method: One chooses M = D; thus N = E + F. The iteration 
matrix is J := D~^{E + F). Knowing the vector x™, one computes 
the components of the vector by the formula 
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Gauss— Seidel method: One chooses M = D — E, and thus N = F. The 
iteration matrix is G := {D — E)~^F. As we shall see below, one never 
computes G explicitly. One computes the approximate solutions by 
a double induction, on m on the one hand, and on t G {1, . . . , n} on 
the other hand: 

i—1 j=n 

j=i 

The difference between the two methods is that in Gauss-Seidel one 
always uses the most recently computed values of each coordinate. 





Relaxation method: It often happens that the Gauss-Seidel method 
converges exceedingly slowly. We thus wish to improve the Gauss- 
Seidel method by looking for a “best” approximated value of the xj 
(with j < i) when computing Instead of being simply x™, as 

in the Jacobi method, or x^^^, as in that of Gauss-Seidel, this best 
value will be an interpolation of both (we shall see that it is merely 
an extrapolation). This justifies the choice of 

M=-D-E, N=(--i]d + F, 

U) \UJ J 

where w G C is a parameter. This parameter remains, in general, 
constant throughout the calculations. The method is called successive 
relaxation. When w > 1, it bears the name successive overrelaxation 
(SOR). The iteration matrix is 

■- (D - ojE)-^{{l - oj)D + ojF). 



The Gauss-Seidel method is a particular case of the relaxation 
method, with oj = 1\ L\ = G. Special attention is given to the choice 
of w, in order to reach the minimum of p{Cuj). The computation of 
the approximate solutions is done through a double induction: 

*-l 3=n . X 

bi-^ ^ Oijxf + ( “ “ 1 j 

j=i j=i+i ' 

Without additional assumptions relative to the matrix A, the only result 
concerning the convergence is the following: 

Proposition 9.2.1 We have p{£u;) > 1“^ ~ 1|- In particular, if the relax- 
ation method converges for a matrix A G M„(G) and a parameter w G C, 
then 





|w — 1| < 1. 

In other words, it is necessary that oj belong to the disk for which (0, 2) is 
a diameter. 
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Proof 

If the method is convergent, we have p{Cu) < 1- However, 



^ _ det((l - w)£) + wF) 
® det(F-wF) 



det((l — lo)D) 
det D 



Hence 



p{Cu;) > I det£,,;|^/” = |1 - w|. 



9.3 Two Cases of Convergence 

In this section and the following one we show that simple and natural 
hypotheses on A imply the convergence of the classical methods. We also 
compare their efficiencies. 



9.3.1 The Diagonally Dominant Case 

We assume here that one of the following two properties is satisfied: 

1. H is strictly diagonally dominant, 

2. H is irreducible and strongly diagonally dominant. 

Proposition 9.3.1 Under one or the other of the hypotheses (1) and (2), 
the Jacohi method converqes, as well as the relaxation method, with uj G 
( 0 , 1 ]. 

Proof 

Jacobi method: The matrix J = D~^{E + F) is clearly irreducible if A 
is. Furthermore, 

n 

^ ] I Jij 1^1) i=l,...,n, 

in which all inequalities are strict if (1) holds, and at least one in- 
equality is strict under the hypothesis (2). Then either Gershgorin’s 
theorem (Theorem 4.5.1) or its improvement. Proposition 4.5.2 for 
irreducible matrices, yields p{J) < 1. 

Relaxation method: We assume that uj G (0, 1]. Let A G C be a nonzero 
eigenvalue of Lui . It is a root of 

det((l — UJ — X)D + XujE + ujF) = 0. 

Hence, A-|-w — 1 is an eigenvalue of A' := ujD~^{XE+F). This matrix 
is irreducible when A is. Then Gershgorin’s theorem (Theorem 4.5.1) 
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shows that 



|A + w - 1| < max <1 X! X! 1“*^! 



j<i j>i 



If |A| > 1, we deduce that 



(9.2) 



|A + w — 1| < max -p-Li \aij | ; 1 < z < 



In case (1), this yields 



j¥=i 



|A + w — 1| < w|A|, 



so that |A| < |A+w— 1| + |1— w| < |A|w+l— w; that is, (|A| — 1)(1— w) < 
0, which is a contradiction. In case (2), Proposition 4.5.2 says that 
inequality (9.2) is strict. One concludes the proof the same way as in 
case (1). 



Of course, this result is not fully satisfactory, since w < 1 is not the 
hypothesis that we should consider. Recall that in practice, one uses over- 
relaxation (that is, w > 1), which turns out to be much more efficient than 
the Gauss-Seidel method for an appropriate choice of the parameter. 

9.3.2 The Case of a Hermitian Positive Definite Matrix 

Let us begin with an intermediate result. 

Lemma 9.3.1 If A and M* + N are Hermitian positive definite (in a 
decomposition A = M — N ), then p{M~^N) < 1. 

Proof 

Let us remark first that M* + N = M* + M — Ais necessarily Hermitian 
when A is. 

It is therefore enough to show that ||M“^iVa;|| a < ||a;|| a for every nonzero 
X e C", where || • ||/i denotes the norm associated to A: 

||x||a = Vx*Ax. 

We have M~^Nx = x — y with y = M~^Ax. Hence, 

\\M~'^Nx\\\ = \\x\\\-y*Ax-x*Ay + y*Ay 

= \\x\W-y*{M*+N)y. 

We conclude by observing that y is not zero; hence y*{M* + N)y > 0. 
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This proof gives a slightly more precise result than what was claimed: By 
taking the supremum of ||M“^7Va;||A on the unit ball, which is compact, 
we obtain ||M“^7V|| < 1 for the matrix norm induced by || • m. 

The main application of this lemma is the following theorem. 

Theorem 9.3.1 If A is Hermitian positive definite, then the relaxation 
method converges if and only if \u) — 1\ < 1. 



Proof 

We have seen in Proposition 9.2.1 that the convergence implies |w — 1| < 
1. Let us see the converse. We have E* = F and D* = D. Thus 



M* 



N=l- 



4 - 1 

LO 



D = 



1 - |w - II 



-D. 






Since D is positive definite, M* + N is positive definite if and only if 
|w- 1| < 1. 



However, Lemma 9.3.1 does not apply to the Jacobi method, since the 
hypothesis {A positive definite) does not imply that M* + N = D + E + F 
must be positive definite. We shall see in an exercise that this method 
diverges for certain matrices A G HPD„, though it converges when A G 
HPD„ is tridiagonal. 



9.4 The Tridiagonal Case 



We consider here the case of tridiagonal matrices A, frequently encountered 
in the approximation of partial differential equations by finite differences 
or finite elements. The general structure of A is the following: 



A = 



I ^ 

x" 

0 



0 \ 
0 



Vo ••• 0 y" I ) 



In other words, the entries Oij are zero as soon as \ j — i\ >2. 

In many cases, these matrices are blockwise tridiagonal, meaning that 
the aij are matrices, the diagonal blocks an being square matrices. In that 
case, the iterative methods also read blockwise, the decomposition A = 
D — E — F being done blockwise. The corresponding iterative methods 
need the inversion of matrices of smaller sizes, namely the an, usually done 
by a direct method. We shall not detail here this extension of the classical 
methods. 

The structure of the matrix allows us to write a useful algebraic relation: 
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Lemma 9.4.1 Let /i he a nonzero complex number and C a tridiagonal 
matrix, of diagonal Cq, of upper triangular part C+ and lower triangular 
part C- . Then 



det C = det 



^Co + . 



Proof 

It is enough to observe that the matrix C is conjugate to 



Co + -C-+^xC+, 
h 

through the linear transformation matrix 



/ T 



Qu = 



V 



0 



\ 

0 

/ 



Let us apply the lemma to the computation of the characteristic 
polynomial Pu, of We have 

{det D)P^{\) = det{{D - ujE){\In - Cu.)) 

= det((w + A — 1)D — loF — \ojE) 

= det I (w + A — l)iA — imjjF E 

for every nonzero /i. Let us choose for p, any square root of A. We then have 

{det D)P^{p'^) = det {{uj + — 1)D — puj{E + F)) 

= {det D) det {{u) + p^ — 1) In — pojJ)- 
Finally, we have the following lemma. 

Lemma 9.4.2 If A is tridiagonal and D invertible, then 

' P^ + U! — 1 ' 



PUh^) = {pu^Pj 



poj 



where Pj is the characteristic polynomial of the Jacobi matrix J. 

Let us begin with the analysis of a simple case, that of the Gauss-Seidel 
method, for which G = C\. 

Proposition 9.4.1 If A is tridiagonal and D invertible, then: 

1. Pc{X'^) = X^Pj{X), where Pq is the characteristic polynomial of 
the Gauss-Seidel matrix G, 
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£ p{G) = p{Jf, 

3. the Gauss-Seidel method converges if and only if the Jacobi method 
converges; moreover, in case of convergence, the Gauss-Seidel method 
converges twice as fast as the Jacobi method; 

4- the spectrum of J is even: Sp J = — Sp J. 

Proof 

Formula (1) comes from Lemma 9.4.2. The spectrum of G is thus formed 
of A = 0 (which is of multiplicity [(n + 1) /2] at least) and of squares of the 
eigenvalues of J, which proves 2). Point 3 follows immediately. Finally, if 
p, GSpJ, then Pj{p) = 0, and also Pa{p^) = 0, so that {—p)^Pj{—p) = 0. 
Finally, either Pj{—p) = 0, or ^ = 0 = —p, in which case Pj{—p) also 
vanishes. 

■ 

In fact, the comparison given in point 3 of the proposition holds under 
various assumptions. For example (see Exercises 3 and 8), it holds true 
when D is positive and E, F are nonnegative. 

We go back to the SOR, with an additional hypothesis: The spectrum of 
J is real, and the Jacobi method converges. This property is satisfied, for 
instance, when A is Hermitian positive definite, since Theorem 9.3.1 and 
Proposition 9.4.1 ensure the convergence of the Jacobi method, and since 
J is similar to the Hermitian matrix 

We also select a real w, that is, u G (0, 2), taking into account Proposition 
9.2.1. The spectrum of J is thus formed of the eigenvalues 

-Ar- < • • • < -Ai < Ai < • • • < Ar- = p{J) < 1, 

from Proposition 9.4.1. This notation does not mean that n be even: If n is 
odd, Ai = 0. Aside from the zero eigenvalue, which does not enter into the 
computation of the spectral radius, the eigenvalues of Lu, are the squares 
of the roots of 



p^ (jJ — 1 — pOjXa, (^■^) 

for 1 < a < r. Indeed, taking — Aa instead of Aa furnishes the sames squares. 

Let us define A(A) := w^A^ + 4(l — w), the discriminant of (9.3). If A(Ao) 
is negative, both roots of (9.3) are complex conjugate, hence have modulus 
\io— 1|^/^. The case A = 0 furnishes the same modulus. If that discriminant 
is strictly positive, the roots are real and of distinct modulus. One of them, 
denoted by pa, satisfies pi > |w — 1|, the other one satisfying the opposite 
inequality. 

From Proposition 9.2.1, p{C,^) is thus equal to one of the following: 

• |a; — 1|, if A(Aa) < 0 for every a, that is, if A{p{J)) < 0; 

• the maximum of the /r^’s defined above, otherwise. 
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The first case corresponds to the choice w G [wj, 2), where 






1- yi-p(J)2 

p{JY 



2 

i + ^i-p{jy 



[ 1 , 2 ). 



Then p{CY) = OJ — 1. 

The second case is w G (0,wj). If A(Aa) > 0, let us denote by Qa{X) 
the polynomial + to — 1 — XtoXa- The sum of its roots being positive, 
Pa is the largest one; it is thus positive. Moreover, Qo(l) = w(l — Aa) > 0 
shows that both roots belong to the same half- line of 1R \ {!}. Since their 
product has modulus less than or equal to one, they are less than or equal 
to one. In particular, 

\lO - < Pa <1- 



This shows that p{£iu) < 1 holds for every oj G (0,2). Under our 
hypotheses, the relaxation method is convergent. 

If Aa Y p{J)i we have Qr{pa) = Pa<jj{Xa - p{J)) < 0. Hence, pa lies 
between both roots of Qr, so that pa < Pr- Finally, the case A(p(J)) < 0 
furnishes p{Ci^) = pi- We then have 

{2pr — wp( J)) -I- 1 — Prp{J) = 0. 
ciu) 

Since 2pr is larger than the sum ojp{J) of the roots and since pr,p{J) G 
[0, 1), one deduces that uj p{C^) is nonincreasing over (0,a;j). 

We conclude that p{Cuj) reaches its minimum at coj, that minimum being 



1 - - p{jy 

i + ^i-p{jf 



Theorem 9.4.1 [See Figure 9.1] Suppose that A is tridiagonal, D is in- 
vertible, and that the eigenvalues of J are real and belong to (—1, 1). Assume 
also that to € M. 

Then the relaxation method converges if and only if u> € (0,2). 

Furthermore, the convergence ratio is optimal for the parameter 



ivj := 



2 

i + ^i-p{jY 



G [1,2), 



where the spectral radius of Lujj is 



(wj - 1 =) 



l-^l-p(J)2 

l + ^l-p{Jf 



A-y/l-p(J)A 
1, P{J) ) ■ 



Remarks: 



• We shall see in Exercise 7 that Theorem 9.4.1 extends to complex 
values of w: Under the same assumptions, p{Cuj) is minimal at ooj, 
and the relaxation method converges if and only if jw — 1| < 1. 
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• The Gauss-Seidel method is not optimal in general; ujj = 1 holds 
only when p{J) = 0, though in practice p{J) is close to 1. A typical 
example is the resolution of an elliptic PDE by the finite element 
method. 

For values of p{J) that are not too close to 1, the relaxation method 
with optimal parameter uj, though improving the convergence ratio, 
is not overwhelmingly more efficient than Gauss-Seidel. In fact, 

p{G)/p (£.,) = (i + vi - p{jy) ' 

lies between 1 (for p{J) close to 1) and 4 (for p{J) = 0), so that the 
ratio 

logp(£„j)/logp(G) 

remains moderate, as long as p{J) keeps away from 1. However, in 
the realistic case where p{J) is close to 1, we have 

p{G)/ log p{C^j) ~ ~ 2^'^^ ' 

which is very small. The number of iterations needed for a prescribed 
accuracy is multiplied by that ratio when one replaces the Gauss- 
Seidel method by the relaxation method with the optimal parameter. 



9.5 The Method of the Conjugate Gradient 

We present here the conjugate gradient method in the most appropriate 
framework, namely that of systems Ax = b where A is real symmetric 
positive definite {A e SPD„). As we shall see below, it is a direct method, 
in the sense that it furnishes the solution x after a finite number of iterations 
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(at most n). However, the round-off errors pollute the final result, and we 
would prefer to consider the conjugate gradient as an iterative method in 
which the number N of iterations, much less than n, gives a rather good 
approximation of x. We shall see that the choice of N is linked to the 
condition number of the matrix A. 

We denote by (•, •) the canonical scalar product on K". When A G SPD„ 
and b G fR", the function 

X 1 -^ J{x) := -(Hx,a;) — {b,x) 

is strictly convex and tends to infinity as ||x|| ^ -boo. It thus reaches 
its infimum at a unique point x, which is the unique vector where the 
gradient of J vanishes. We shall denote by r (for residue) the gradient of 
J: r{x) = Ax — b. Hence x is the solution of the linear system Ax = b. 

If Ax = b and x G IR", x ^ x, then 

J{x) = J{x) + 2 — x),x — x) > J{x). (9-4) 

The conjugate gradient is thus a descent method. 

We shall denote by E the quadratic form associated to A: E{x) := 
{Ax,x). It is the square of a norm of iR”. The character Ea indicates 
the orthogonality with respect to the scalar product defined by A. 

9.5.1 A Theoretical Analysis 

Let xq G IR" be given. We define cq = xq — x, xq = r(xo) = Acq. We may 
assume that cq A 9; otherwise, we would already have the solution. For 
k > 1, let us define the vector space 

Hk := {P(A)ro | P G R[X], degP<k- 1}, Ho = {0}. 

In TLk+i, the linear subspace TLk is of codimension 0 or 1. In the first case, 
Hk+i = Hk, and it follows that Hk +2 = AHk+i + Hk+i = AHk + Hk = 
Hk+i = Hk and thus by induction, Hk = Hm for every m > k. Let us 
denote by I the smallest index such that Hi = Hi+i. For k < I, Hk is thus 
of codimension one in Hk+i, while if fc > /, then Hk = Hk+i- It follows 
that dvcaHk = kii k <l.ln particular, I < n. 

One can always find, by Gram-Schmidt orthonormalization, an A- 
orthogonal^ basis (that is, such that (Apj,pi) = 0 if z yf j) {po, . . . ,pi-i} 
of Hi such that {po, ■ . ■ ,Pk-i} is a basis of Hk when k <1. The vectors pj, 
which are not necessarily unit vectors, are defined, up to a scalar multiple, 

by 

Pk G Hk+1^ Pk^AHk. 



^One must distinguish in this section between the two scalar products, namely (■, ■) 
and (A-, ■). 
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One says that the vectors pj are pairwise conjugate. Of course, conjugation 
means A-orthogonality. This explains the name of the method. 

The quadratic function J, strictly convex, reaches its infimum on the 
affine subspace xq + Tik at a unique vector, which we denote by Xk- This 
notation makes sense for k = Q.li x = y+^Pk G xo+Hk+i with y G xo+Hk, 
then 



J{x) = J{x) + -E{x — x) 

= J{x) + ^E{y -x) + ^'-f'^E{pk) +j{Apk,y- x) 

= J{y) + ^l'^E{pk) -j{Apk,eo), 

since {Apk, y — xq) = 0. Hence, minimizing J over xq + Ti-k+i amounts to 
minimizing J over xq + ktk, together with minimizing 7 ^j'^E{pk) — 

7 (pfc,ro) over IR. We therefore have 

Xk+i - Xk G IRpk- (9.5) 

By definition of I there exists a nonzero polynomial P of degree I such 
that P{A)ro = 0, that is, AP{A)eo = 0. Since A is invertible, P{A)eo = 0. 
Let us assume that P{0) vanishes. Then P{X) = XQ{X) with deg Q = l—l. 
Therefore, Q{A)rQ = 0: The map S S'(H)ro is not one-to-one over the 
polynomials of degree less than or equal to I — 1. Hence dimHi < I, a, 
contradiction. Hence P{0) yf 1, and we may assume that P{0) = 1. Then 
P{X) = 1 — XR{X), where degi? = / — 1. Thus cq = G Hi or, 

equivalently, x G xq + Hi- Conversely, \i k < I and T G a;o -I- Hk, then 
eo G Hk', that is, eg = Q(A)ro, where degQ < A: — 1. Then Qi(A)eo = 0, 
because = 1 — XQ(X). Therefore, Qi(A)rg = 0, <5i(0) yf 0, and 

deg Qi < k. Hence k > k, that is, k = 1. Summing up, we have x G xg + Hi 
but X ^ xg + Hi-i - Therefore, xi = x and Xk ^ x if k < 1. 

Lemma 9.5.1 Let us denote by Xn >■■■ > Ai(> 0) the eigenvalues of A. 
If k < I, then 

E{xk — x) < E{eg) ■ min max |1 -|- Aj(5(Aj)p. 

degQ<k—l j 



Proof 

Let us compute 



E{xk - x) 



min{i?(a; — x)\x G Xg+ Hk} 

min{L;(eo + y)\y G Hk} 

mm{E{{In + AQ{A))eg) \ deg Q < k - 1} 

min{||(/„ -y AQ{A))A^I‘^eg\\l \ degQ <k - 1}, 
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where we have used the equality {Aw,w) = || Hence 

E{xk-x) < min{||/„ + H(5(H)||2||H^/^eo||2 I degQ < fc - 1} 
= -E(eo) min{/9(/„ + AQ{A))'^ \ degQ < k - 1}, 

since p{S) = ||S'||2 holds for every real symmetric matrix. 



From Lemma 9.5.1, we deduce an estimate of the error E(xk — x) by 
bounding the right-hand side by 

min max \l + tQ(t)\'^. 

degQ<fc-l t6[Ai,A„] 



Classically, the minimum is reached for 



1 + XQ{X) = LOkTk 
where is a Chebyshev polynomial: 



2X - Ai - A„ 
An — Al 



cosfcarccost if |t| < 1, 

Tk{t) = cosh fcarcosht if t > 1, 

(— 1)* cosh fc arcosh |t| if t < — 1. 

The number ujk is the number that furnishes the value 1 at X = 0, namely 

m 

Wfc — 



Then 



( ct ) ' 

max |1 -I- tQ{t)\ = |wfc| = 



[Ai,A„] 

Hence E{xk — x) < \oJk\‘^E{eo) ■ However, if 

6 := arrcosh 



cosh k arcosh 

An — Ai 



An + Al 



An - Al ’ 

then \ojk\ = (coshfc0)“^ < 2exp(— fc0), while exp(— 0) is the root, less than 
one, of the quadratic polynomial 

An — Al 

Setting K{A) := ||H||2||AI“^||2 = A„/Ai the condition number of A, we 
obtain 



e-^ = 



An + Al 
An — Al 



An + A 



An A 



1/ + a/^ K{A) + 1 



The final result is the following. 
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Theorem 9.5.1 If k < I, then 

E{xk -x) < 4E{xo - x) ^ + t) ■ 

We now set Vk = r{xk) = A{xk — x)- We have seen that n = 0 and that 
rfc ^ 0 if fc < ?. In fact, Vk is the gradient of J at Xk- The minimality of J at 
Xk over xo + Hk thus implies that rk-LHk (for the usual scalar product). In 
other words, we have (rk,Pj) = Oifj < k. However, Xk~x G eo+7ifc can also 
be written a,s Xk~ x = Q(A)eo with deg Q < k, which implies Xk = Q{A)rQ, 
so that Tk G Hk+i- li k < I, one therefore has Hk+i = Hk © IRrk- 
We now normalize pk (which was not done up to now) by 

Pk-Xk & Hk- 

In other words, pk is the T-orthogonal projection of = r{xk), parallel to 
Hk- It is actually an element of Hk+i, since G Hk-ki- K is also nonzero 
since Xk ^ Hk- We note that Xk is orthogonal to Hk with respect to the 
usual scalar product, though pk is orthogonal to Hk with respect to the 
H-scalar product; this explains why pk and Xk are generally different. 

If j < fc-2, we compute {A{pk~Xk),Pj) = -{Axk,Pj) = -{xk,Apj) = 0. 
We have used successively the conjugation of the pk, the symmetry of A, 
the fact that Apj G Hj+ 2 , and the orthogonality of Xk and Hk- We have 
therefore pk — Xk-LAHk-i, so that 

Pk = rk + 5kPk-i (9.7) 

for a suitable number Sk- 



9.5.2 Implementing the Conjugate Gradient 

The main feature of the conjugate gradient is the simplicity of the com- 
putation of the vectors Xk, which is done by induction. To begin with, we 
have po = xq = Axq — b, where xq is at our disposal. Let us assume now 
that Xk and pk-i are known. Then Xk = Axk — b. If Xk = 0, we already 
have the solution. Otherwise, the formulas (9.5, 9.7) show that in fact, 
Xk+i minimizes J over the plane Xk + IRxk © IRpk-i- We therefore have 
Xk+i = Xk + akXk+PkPk-i, where the entries ak, Pk are obtained by solving 
the linear system of two equations 

( ak{Axk,Xk) + Pk{Axk,Pk-i) + \\xk\P = 0 , 

\ ak{Axk,Pk-i) + Pk{Apk-i,Pk-i) = 0 

(we have used (xk,Pk-i) = 0). Then we have 6k = Pk/ctk- Observe that ak 
is nonzero, because otherwise Pk would vanish and Xk would too. 

Summing up, the algorithm reads as follows 

• Choose xo; then define po = "Co = ^’(a^o) := Axq — b. 
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• For fc > 0 with unit increment, do 

— Compute Tk = r{xk) = Axk — b. If Vk = 0, then x = Xk- 
— Otherwise, minimize J(xk + ark + f3pk-i), by computing ak, Pk 
as above. 

— Define 



Pk+l — 4 “ {Pk / apjPk—l 1 ^k+1 — 4 “ akPk • 

A priori, this computation furnishes the exact solution x in I iterations. 
However, I equals n in general, and the cost of each iteration is 0{rP). 
The conjugate gradient, viewed as a direct method, is thus rather slow. 
One often uses this method for sparse matrices, whose maximal number 
of nonzero elements m per rows is small compared to n. The complexity 
of an iteration is then 0{mn). However, that is still rather costly as a 
direct method {0{mrP) operations in all), since the complexity of iterative 
methods is also reduced for sparse matrices. 

This explains why one prefers to consider the conjugate gradient as an 
iterative method, in which one makes only a few iterations N n. Strictly 
speaking. Theorem 9.5.1 does not define a convergence rate t, since one 
does not have, in general, an inequality of the form 



||xfc+i -x\\<e '"\\xk - a;||. 



In particular, one is not certain that \\xi — x|| is smaller than ||xo — x||. 
However, the inequality (9.6) is analogous to what we have for a classi- 
cal iterative method, up to the factor 4. We shall therefore say that the 
conjugate gradient admits a convergence rate tqg that satisfies 



rcG < b 



\/^-l 

/A(l) + l' 



(9.8) 



This rate is equivalent to when K{A) is large. This method 

can be considered as an iterative method when nrcG 1 since then it is 
possible to choose N n. Obviously, a sufficient condition is K{A) <C 
Application: Let us consider the resolution of the Laplace equation in an 
open bounded set H of with a Dirichlet boundary condition, by the 
finite elements method: 



Au = / in H, u = 0 on 



The matrix A is symmetric, reflecting the symmetry of the variational 
formulation 



(Vm • Vw 4- fv) dx = 0, 



Vw G iLg(H). 



If the diameter of the grid is h with 0 < /i <C 1, and if that grid is regular 
enough, the number of degrees of freedom (the size of the matrix) n is of 
order C/hP, where C is a constant. The matrix is sparse with m = 0(1). 
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Each iteration thus needs 0(n) operations. Finally, the condition number of 
A is of order clh? . Hence, a number of iterations N ^ l//i is appropriate. 
This is worthwhile as soon as d > 2. The method becomes more useful as 
d grows larger and the threshold l//i is independent of the dimension. 
Preconditioning: In practice, the performance of the method is improved 
by preconditioning the matrix A. The idea is to replace the system Ax = b 
by B"'"ABy = B"'"b, where the inversion of B is easy, for example B is block- 
triangular or block-diagonal with small blocks. If BB^ is close enough to 
A~^ , the condition number of the new matrix is smaller, and the number 
of iterations is reduced. Actually, when the condition number reaches its 
infimum AT = 1, we have A = J„, and the solution a; = 6 is obvious. The 
simplest preconditioning consists in choosing B = Its efficiency is 

clear in the (trivial) case where A is diagonal, because the matrix of the 
new system is In, and the condition number is lowered to 1. Observe that 
preconditioning is also used with SOR, because it allows us to diminish the 
value of p{J), hence also the convergence rate. We shall see in Exercise 5 
that, if A G SPD„ is tridiagonal and if I? = dJ„ (which corresponds to the 
preconditioning described above), the conjugate gradient method is twice 
as slow as the relaxation method with optimal parameter; that is, 

0 = 

This equality is obtained by computing 0 and the optimal convergence 
rate trl of the relaxation method in terms of p(J). In the real world, in 
which A might not be tridiagonal, or be only blockwise tridiagonal, the 
map p(J) 1 -^- 9 remains the same, while trl deteriorates. The conjugate 
gradient method becomes more efficient than the relaxation method. It has 
also the advantage that it does not need the preliminary computation of 
p(J), in contrast to the relaxation method with optimal parameter. 

The reader will find a deeper analysis of the method of the conjugate 
gradient in the article of J.-F. Maitre in [1]. 



9.6 Exercises 

1. Let A be a tridiagonal matrix with an invertible diagonal and let J 
be its Jacobi matrix. Show that J is conjugate to — J. Compare with 
Proposition 9.4.1. 

2. We fix n > 2. Use Theorem 3.4.2 to construct a matrix A G SPD„ 
for which the Jacobi method does not converge. Show in particular 
that 

sup{p(J) I A G SPD„, D = In} = n-l. 

3. Let A G M„(IR) satisfy an > 0 for every index i, and < 0 
whenever j yf i. Using (several times) the weak form of the Perron- 
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Frobenius theorem, prove that either 1 < p{J) < p{G) or p{G) < 
p{J) < 1- In particular, as in point 3 of Proposition 9.4.1, the Jacobi 
and Gauss-Seidel methods converge or diverge simultaneously, and 
Gauss-Seidel is faster in the former case. Hint: Prove that 

(p(G) > 1) ^ (p(J) > 1) ^ (p(G) > p(J)) 

and 



(p(G) < 1) ^ (p(J) > p{G)). 

4. Let n > 2 and A G HPD„ be given. Assume that A is tridiagonal. 

(a) Verify that the spectrum of J is real and even. 

(b) Show that the eigenvalues of J satisfy A < 1. 

(c) Deduce that the Jacobi method is convergent. 



5. Let A G HPD„, A = D — E — E* . Use the Hermitian norm || • H 2 . 

(a) Show that \{{E + E*)v,v)\ < /9(J)||iJ^/^v|p for every v G C". 
Deduce that 

(b) Let us define a function by 



g{x) := 



\Ac - 1 
■\/x + 1 



Verify that 

/ l + p(J) A l-yi-p(J)2 
^\l-p{J)J p{J) 

(c) Deduce that if A is tridiagonal and if D = din, then the con- 
vergence ratio 6 of the conjugate gradient is the half of that of 
SOR with optimal parameter. 



6. Here is another proof of Theorem 9.3.1, when co is real. Let A G 

HPD„. 

(a) Suppose we are given uj G (0, 2). 

i. Assume that A = {9 real) is an eigenvalue of Show 

that (1 — w — A)e“*^ G IR. 

ii. Deduce that A = 1, then show that this case is impossible 
too. 

iii. Let m{ijj) be the number of eigenvalues of of modulus 
less than or equal to one (counted with multiplicities) . Show 
that m is constant on (0, 2). 

(b) i. Gompute 



1 
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ii. Deduce that m = n, hence that the SOR converges for every 

w e (0,2). 

7. (Extension of Theorem 9.4.1 to complex values of co). We still assume 
that A is tridiagonal, that the Jacobi method converges, and that the 
spectrum of J is real. We retain the notation of Section 9.4. 

(a) Given an index a such that Ao > 0, verify that A(Aa) vanishes for 
two real values of oj, of which only one, denoted by coa, belongs 
to the open disk D = D(l; 1). Show that 1 < Wo < 2. 

(b) Show that if w € D\[uja,2), then the roots of X’^+uj — l — uiXa-A 

have distinct moduli, with one and only one of them, denoted 
by of modulus larger than |w — 

(c) Show that w i— > /ia is holomorphic on its domain, and that 

lim |Mo(w)P = 1, 
lim |/Xo(w)|^ = 7-1 if 7 G[wo, 2 ). 

ccJ — »-7 

(d) Deduce that |/ra(w)| < 1 (use the maximum principle), then that 
the relaxation method converges for every tv € D. 

(e) Show, finally, that the spectral radius of is minimal for u = 
u>r, which previously was denoted by ujj. 

8. Let i? be a cyclic matrix of order three. With square diagonal blocks, 
it reads blockwise as 

/ 0 0 Ml \ 

R = Ma 0 0 . 

V 0 Ms 0 y 

We wish to compare the Jacobi and Gauss-Seidel methods for the 
matrix A:= I — B. Gompute the matrix G. Show that p{G) = p{J)^- 
Deduce that both methods converge or diverge simultaneously and 
that, in case of convergence, Gauss-Seidel is three times faster than 
Jacobi. Show that for , the convergence or the divergence still 
holds simultaneously, but that Gauss-Seidel is only one and a half 
times faster. Generalize to cyclic matrices of any order p. 




10 

Approximation of Eigenvalues 



The computation of the eigenvalues of a square matrix is a problem of 
considerable difficulty. The naive idea, according to which it is enough to 
compute the characteristic polynomial and then find its roots, turns out 
to be hopeless because of Abel’s theorem, which states that the general 
equation P{x) = 0, where P is a polynomial of degree d > 5, is not solvable 
using algebraic operations and roots of any order. For this reason, there 
exists no direct method, even an expensive one, for the computation of 
Sp(M). 

Dropping half of that program, one could compute the characteristic 
polynomial exactly, then compute an approximation of its roots. But the 
cost and the instability of the computation are prohibitive. Amazingly, the 
opposite strategy is often used: A standard algorithm for computing the 
roots of a polynomial of high degree consists in forming its companion 
matrix^ and then applying to this matrix the QR algorithm to compute its 
eigenvalues with good accuracy. 

Hence, all the methods are iterative. In particular, we shall limit ourselves 
to the cases K = JR or G. The general strategy consists in constructing a 
sequence of matrices 






^Fortunately, the companion matrix is a Hessenberg matrix; see below for this notion 
and its practical aspects. 
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pairwise similar, whose structure has some convergence property. Each 
method is conceived in such a way that the sequence converges to a simple 
form, triangular or diagonal, since then the eigenvalues can be read on the 
diagonal. Such convergence is not always possible. For example, an algo- 
rithm in M„(iR) cannot converge to a triangular form when the matrix 
under consideration possesses a pair of nonreal eigenvalues. 

There are two strategies for the choice of One can naively take 

= M. But since an iteration on a generic matrix is rather costly, 
one often uses a preliminary reduction to a simple form (for example the 
Hessenberg form, in the QR algorithm) , which is preserved throughout the 
iterations. With a few such tricks, certain methods can be astonishingly 
efficient. The danger of iterative methods is the possible growth of round- 
off errors and errors in the data. Typically, a procedure that doubles the 
errors at each step transforms an initial error of size 10“^ into an 0(1) 
after ten iterations, which is by no means acceptable. For this reason, it 
is important that the passage of to be contracting, that is, 

that the errors be damped, or at worst not be amplified. Since is 

conjugate to by some matrix P (which in fact depends on m), the 

growth rate is approximately the number K{P) := ||P|| • ||P”^||, called the 
condition number, which is always greater than or equal to one. Using the 
induced norm || • || 2 , it equals 1 if and only if P is a similitude matrix; that 
is, P G G ■ U„. For this reason, each iterative method builds sequences 
of unitarily similar matrices: The conjugation matrices are unitary 

(orthogonal if the ground field is JR) . 



10.1 Hessenberg Matrices 

Definition 10.1.1 A square matrix M G M„(PT) is called upper Hessen- 
berg (one speaks simply of a Hessenberg matrix) if mjk = 0 for every pair 
{j, k) such that j — k >2. 

A Hessenberg matrix thus has the form 

^ X ^ 

y 

0 ■■. : 

V 0 ••• 0 z t J 

In particular, an upper triangular matrix is a Hessenberg matrix. 

When computing the spectrum of a given matrix, we may always restrict 
ourselves to the case of an irreducible matrix, using a conjugation by a 
permutation matrix: If M is reducible, we may limit ourselves to a block- 
triangular matrix whose diagonal blocks are irreducible. It is enough then 
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to compute the spectrum of each diagonal block. This principle applies 
as well to a Hessenberg matrix. Hence one may always assume that M is 
Hessenberg and that the nij+ij’s are nonzero. In that case, the eigenspaces 
have dimension one. In fact, ii X € K, let L be the matrix extracted from 
M — XIn by deletion of the first row and the last column. It is a triangular 
matrix of invertible because its diagonal entries, the rrij+ij’s, 

are nonzero. Hence, M — XIn is of rank at least equal to n — 1, which implies 
that the dimension of ker(M — A/„) equals at most one. 

Proposition 10.1.1 IfM e M„(itT) is a Hessenberg matrix with nij+ij yf 
0 for every j, in particular if this matrix is irreducible, then the eigenvalues 
of M are geometrically simple. 

The example 




shows that the eigenvalues of an irreducible Hessenberg matrix are not 
necessarily algebraically simple. 

From the point of view of matrix reduction by conjugation, one can at- 
tribute two advantages to the Hessenberg class, compared with the class of 
triangular matrices. First of all, ii K = IR, many matrices are not trigonal- 
izable in IR, though all are trigonalizable in C. Of course, computing with 
complex numbers is more expensive than computing with real numbers. 
But we shall see that every square matrix with real entries is similar to a 
Hessenberg matrix over the real numbers. Next, if K is algebraically closed, 
the trigonalization of M needs the effective computation of the eigenvalues, 
which is impossible in view of Abel’s theorem. However, the computation of 
a similar Hessenberg matrix is obtained after a finite number of operations. 

Let us observe, finally, that as the trigonalization (see Theorem 3.1.3), 
the Hessenberg form is obtained through unitary transformations, a well- 
conditionned process. When K = IR, these transformation are obviously 
real orthogonal. 

Theorem 10.1.1 For every matrix M G M„(C') there exists a unitary 
transformation U such that U~^MU is a Hessenberg matrix. If M G 
M„(IR), one may take U G 0„. 

Moreover, the matrix U is computable in 5n^/3 -I- 0{n?) multiplications 
and 4n^/3 -I- 0{n^) additions. 

Proof 

Let X G C™ be a unit vector: X*X = 1. The matrix of the unitary 
(orthogonal) symmetry with respect to the hyperplane X-^ is S = Im — 
2XX*. In fact, SX = X - 2X = -X, while Y G X^; that is = 0, 
implies SY = Y. 
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We construct a sequence Mi = M, . . . ,M„_i of unitarily similar 
matrices. The matrix M„_r will be of the form 



H B \ 

Or,n-r-l Z N J ’ 

where H G M„_r(C') is Hessenberg and Z is a vector in Hence, M„_i 
will be suitable. 

One passes from Mn-r to Mn-r+i, that is, from r to r— 1, in the following 
way. Let be the first vector of the canonical basis of If Z is colinear 
to e^, one does nothing besides defining Mn-r+i = Mn-r- Otherwise, one 
chooses X G C"" so that SZ is parallel to (we discuss below the possible 
choices for X). Then one sets 

-rr I In—r On— r,r 

“ 'v Oon-r S 

which is a unitary matrix, with V* = V~^ = V (such a matrix is called a 
Householder matrix). We then have 



V-^Mn-rV = 



H 

On,n— r— 1 SZ 



BS 

SNS 



We thus define Mn-r+i = V ^Mn-rV. 

There are two possible choices for S, given by 



:= 



1 

\\Z±\\Zhg]\^ 



{Z±\\Zhq), 




It is always advantageous to choose the sign that gives the largest denom- 
inator, namely the positive sign. One thus optimizes the round-off errors 
when Z is almost aligned with e^. 

Let us consider now the complexity of the (n — r)th step. Only the terms 
of order and r{n — r) are meaningful. The computation of X, in 0(r) 
operations, is thus negligible, like that of X* and of 2X. The computation 
of BS = B — (BX)(2X*) needs about 4r(n — r) operations. Then 2NX 
needs 2r^ operations, as does 2X*N. We next compute 4X*NX, and then 
form the vector T := 4{X* NX)X — 2NX at the cost 0{r). The product 
TX* takes operations, as 2X{X* N). Then N + TX* — X(2X*N) needs 
2r^ additions. The complete step is thus accomplished in 7r^ -I- 4r(n — 
r) + 0{n) operations. A sum from r = 1 to n — 2 yields a complexity 
of 3n^ -I- O(n^), in which one recognizes 5n^/3 -I- O(n^) multiplications, 
4n^/3 -I- 0{vS) additions, and 0{n) square roots. 



When M is Hermitian, the matrix U~^MU is still Hermitian. Since it 
is Hessenberg, it is tridiagonal, with ajj+i = dj+ij and ajj G IR. The 
symmetry reduces the complexity to 2v? jZ 0{n^) multiplications. One 
can then use the Hessenberg form of M in order to localize its eigenvalues. 
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Proposition 10.1.2 If M is tridiagonal Hermitian and if the entries 
Wj+ij are nonzero (that is, if M is irreducible), then the eigenvalues of 
M are real and simple. Furthermore, if Mj is the (Hermitian, tridiagonal, 
irreducible) matrix obtained by keeping only the j last rows and columns of 
M, the eigenvalues of Mj strictly separate those of Mj+\. 

The separation, not necessarily strict, of the eigenvalues of Mj+i by those 
of Mj has already been proved, in a more general framework, in Theorem 

3.3.3. 

Proof 

The eigenvalues of a Hermitian matrix are real. Since this matrix is 
diagonalizable. Proposition 10.1.1 shows that the eigenvalues are simple. 
Both properties can be deduced from the following analysis. 

We proceed by induction on j. If j > 1, we decompose the matrix Mj+i 
blockwise: 

/ m d 0 • • • 0 \ 

a 

0 M,- 

V 0 J 

where a yf 0 and m G Ft, m > 0. Let Pi be the characteristic polynomial 
of Ml. We compute that of Mj+i by expanding according to the elements 
of the first column: 

P,+i(X) = mPj{X) - |apP,_i(X), (10.1) 

where Pq = 1 by convention. 

The induction hypothesis is as follows: Pj and Pj-i have real entries and 
have respectively j and j — 1 real roots /xi, . . . , pij and cti, . . . , aj-i, with 

/Xi < (Ti < ^2 < • • • < CTj-l < Fj. 

In particular, they have no other roots, and their roots are simple. The signs 
of the values of Pj-i at points /ij thus alternate. Since Pj-i is positive over 
(crj_i,+oo), we have Pj-i{fj,k) > 0. 

This hypothesis clearly holds at step j = 1. If j > 2 and if it holds at 
step j, then (10.1) shows that Pj+i G Fl[X]. Furthermore, 

i-iy-'^p,+M = -\ay{-iy-'^p,_M < 0 . 

From the intermediate value theorem, Pj+i possesses a root Xk in 
(/Xfc_i,/ifc). Furthermore, Pj+i{p,j) < 0, and Pj+i{x) is positive for x 1; 
hence there is also a root in (/ij,+oo). Likewise, Pj+i has a root in 
(— 00 ,^ 1 ). Hence, Pj+i possesses j + 1 distinct real roots Xk, with 

Ai < /xi < A2 < • • • < /Xj < Aj+i. 

Since Pj+i has degree j + 1, there is no root other than the A^’s, and these 
are simple. 
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The sequence of polynomials Pj is a Sturm sequence, which allows 
us to compute the number of roots of in a given interval (a, 6). A 
Sturm sequence is a finite sequence of real polynomials Qq, . . . , Qn, with 
Qo a nonzero constant such that Qj{x) = 0 and 0 < j < n imply 
Qj+i{x)Qj-i{x) < 0. In particular, Qj and Qj+i do not share a com- 
mon root. If a G IR is not a root of Qn, we denote by V{a) the number of 
sign changes in the sequence (Qo(n)) • ■ • , Qn(a)), with the zeros playing no 
role. 

Proposition 10.1.3 If Q„(a) yf 0 and Q„(6) 0, and if a < b, then the 

number of roots of Q„ in (a,b) is equal to V{a) — V{b). 

Let us remark that it is not necessary of compute the polynomials Pj to 
apply them to this proposition. Given a G IR, it is enough to compute the 
sequence of values Pj (a) . 

Once an interval (a, b) is known to contain an eigenvalue A and only that 
one (by means of Proposition 10.1.3 or Theorem 4.5.1), one can compute an 
approximate value of A, either by dichotomy, or by computing the numbers 
V{{a+b)/2), . . . , or by the secant or Newton method. In the latter case, one 
must compute P„ itself. The last two methods are convergent, provided that 
we have a good initial approximation at our disposal, because Pf{\) yf 0. 

We end this section with an obvious but nevertheless useful remark. 
If M is Hessenberg and T upper triangular, the products TM and MT 
are still Hessenberg (that would not be true if both matrices were Hessen- 
berg). For example, if M admits an LU factorization, then L is Hessenberg, 
and thus has only two nonzero diagonals, because L = MU~^. Similarly, if 
M G GL„(C), then the factor Q of the factorization M = QR is again Hes- 
senberg, because Q = MR~^. An elementary compactness and continuity 
argument shows that the same fact holds true for every M G M„(C'). 



10.2 The Q-R Method 

The QR method is considered the most efficient one for the approximate 
computation of the whole spectrum of a square matrix M G M„(C'). One 
employs it only after having reduced M to Hessenberg form, because this 
form is preserved throughout the algorithm, while each iteration is much 
cheaper than it is for a generic matrix. 



10.2.1 Description of the QR Method 

Let A G M„(AT) be given, with K = IR or G. We construct a sequence 
of matrices {Aj)j^]N, with Ai = A. The induction Aj i-^- Aj+i consists 
in performing the QR factorization of Aj, Aj = QjRj, and then defining 
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Aj+i := RjQj. We then have 

Aj+i ~ Qj AjQj, 

which shows that Aj^i is unitarily similar to Aj. Hence, 

Aj = {Qo ■ ■ ■ Qj-i) ^ A{Qo ■ • ■ Qj-i) (10-2) 

is conjugate to A by a unitary transformation. 

Let Pj := Qo' ■ -Qj-i, which is unitary. Since U„ is compact, the se- 
quence {Pj)j(z]N possesses cluster values. Let P be one of them. Then 
A' := P~^AP = P*AP is a cluster point of {Aj)j^]N. Hence, if the se- 
quence (Aj)j converges, its limit is unitarily similar to A, hence has the 
same spectrum. 

This argument shows that in general, the sequence {Aj)j does not con- 
verge to a diagonal matrix, because then the eigenvectors of A would be 
the columns of P. In other words, A would have an orthonormal eigenba- 
sis. Namely, A would be normal. Except in this special case, one expects 
merely that the sequence {Aj)j converges to a triangular matrix, an expec- 
tation that is compatible with Theorem 3.1.3. But even this hope is too 
optimistic in general. For example, if A is unitary, then Aj = A for every j, 
with Qj = A and Rj = In', in that case, the convergence is useless, since the 
limit A is not simpler than the data. We shall see later on that the reason 
for this bad behavior is that the eigenvalues of a unitary matrix have the 
same modulus: The QR method does not do a good job of separating the 
eigenvalues of close modulus. 

An important case in which a matrix has at least two eigenvalues of 
the same modulus is that of matrices with real entries. If A € M„(iR), 
then each Qj is real orthogonal, Rj is real, and Aj is real. This is seen by 
induction on j. A limit A! will not be triangular if some eigenvalues of A 
are nonreal, that is, if A possesses a pair of complex conjugate eigenvalues. 

Let us sum up what can be expected in a brave new world. If all the 
eigenvalues of A € M„(C) have distinct moduli, the sequence {Aj)j might 
converge to a triangular matrix, or at least its lower triangular part might 
converge to 

/ Ai \ 

0 A 2 

VO • • • 0 Xn J 

When A G M„(IR), one makes the following assumption. Let p be the 
number of real eigenvalues and 2q that of nonreal eigenvalues; then there 
are p + q distinct eigenvalue moduli. In that case, {Aj)j might converge to 
a block-triangular form, the diagonal blocks being 2 x 2 or 1 x 1. The limits 
of the diagonal blocks provide trivially the eigenvalues of A. 
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The assertions made above have never been proved in full generality, 
to our knowledge. We shall give below a rather satisfactory result in the 
complex case. 



10.2.2 The Case of a Singular Matrix 

When A is not invertible, the QR factorization is not unique, raising a 
difficulty in the definition of the algorithm. The computation of the deter- 
minant would detect immediately the case of noninvertibility, but would 
not provide any solution. However, if the matrix has been first reduced to 
the Hessenberg form, then a single QR iteration detects the case and does 
provide a solution. Indeed, if A is Hessenberg but not invertible, and if 
A = QR, then Q is Hessenberg and R is not invertible. If 021 = 0, the 
matrix A is block-triangular and we are reduced to the case of a matrix of 
size (n — 1) X (n — 1) by deleting the first row and the first column. Oth- 
erwise, there exists j >2 such that rjj = 0. The matrix Ai = RQ is then 
block-triangular, because it is Hessenberg and (Ai)jj-i = rjjqjj-i = 0. 
We are thus reduced to the computation of the spectra of two matrices of 
sizes j X j and (n — j) x {n — j), the diagonal blocks of Ai. After finitely 
many such steps (not larger than the multiplicity of the null eigenvalue), 
there remain only Hessenberg invertible matrices to deal with. We shall 
assume therefore from now one that A G GL„(AT). 



10.2.3 Complexity of an Iteration 

An iteration of the QR method requires the factorization Aj = QjRj and 
the computation of = RjQj. Each part costs O(n^) operations if it 
is done on a generic matrix (using the naive way of multiplying matrices) . 
Since the reduction to the Hessenberg form has a comparable cost, we loose 
nothing by reducing A to this form. Actually, we make considerable gains 
in two aspects. First, the cost of the QR iterations is reduced to O(n^). 
Second, the cluster values of the sequence (Aj)j must have the Hessenberg 
form too. 

Let us examine first the Householder method of QR factorization for a 
generic matrix A. In practice, one computes only the factor R and matri- 
ces of unitary symmetries whose product is Q. One then multiplies these 
unitary matrices by R on the left to obtain A! = RQ. 

Let ai G C" be the first column vector of A. We begin by determining a 
unit vector vi G C?" such that the hyperplane symmetry iJi := — 2viVi 
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sends ai to ||ai|| 2 e^. The matrix HiA has the form 

llai||2 X ^ 

0 : 

0 y ■■■ j 

We then perform these operations again on the matrix extracted from A 

by deleting the first rows and columns, and so on. At the kth step, Hk is a 
matrix of the form 

(Ik 0 \ 

0 In-k - 2vkVl J ’ 

where Vk € is a unit vector. The computation of Vk requires 0{n — k) 

operations. The product HkA'^^\ where A^^'^ is block-triangular, amounts 
to that of two square matrices of size n — k, one of them I — 2vkV^. We thus 
compute a matrix N — 2vv*N from v and N, which costs about 4(n — k)'^ 
operations. Summing from k = ltofc = n— 1, we find that the complexity 
of the computation of R alone is 4n^/3 -I- O(n^). As indicated above, we 
do not compute the factor Q, but compute all the matrices RHn-i ■ ■ ■ Hk- 
That necessitates 2n^ -I- 0{n) operations (check this!). The complexity of 
one step of the QR method on a generic matrix is thus lOn^/3 -I- O(n^). 

Let us now analyze the situation when A is a Hessenberg matrix. By 
induction on k, we see that Vk belongs to the plane spanned by and 
Its computation needs 0(1) operations. Then the product of Hk and 
can be obtained by simply recomputing the rows of indices k and 
about 6(n — k) operations. Summing from fc = 1 to n — 1, we find that the 
complexity of the computation of R alone is 3n^ -I- 0{n). The computation 
of the product (i?iJ„_i • • • Hk+i)Hk needs about 6k operations. Finally, the 
complexity of the QR factorization of a Hessenberg matrix is 6n^ -I- 0{n), 
in which there are 4n^ -I- 0(n) multiplications. 

To sum up, the cost of the preliminary reduction of a matrix to Hessen- 
berg form is less than or equal to what is saved during the first iteration 
of the QR method. 




10.2.4 Convergence of the QR Method 

As explained above, the best convergence statement assumes that the 
eigenvalues have distinct moduli. 

Let us recall that the sequence Ak is not always convergent. For example, 
if A is already triangular, its QR factorization is Q = D, R = D~^A, with 
dj = cijj/\a,jj\. Hence, Ai = D~^AD is triangular, with the same diagonal 
as that of A. By induction, Ak is triangular, with the same diagonal as 
that of A. We have thus Qk = D for every fc, so that Ak = D~^AD^. The 
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entry of index (/, m) is thus multiplied at each step by a unit number zim, 
which is not necessarily equal to one if / < m. Hence, the part above the 
diagonal of Ak does not converge. 

Summing up, a convergence theorem may concern only the diagonal of 
Ak and what is below it. 

Lemma 10.2.1 Let A S GL„(itT) be given, with K = IR or C. Let 
Ak = QkRk be the sequence of matrices given by the QR algorithm. Let 
us define Pk = Qo ■ ■ ■ Qk-i and Uk = Rk-i ■ ■ ■ Ro- Then PkUk is the QR 
factorization of the kth power of A: 

A>^ = PkUk. 

Proof 

From (10.2), we have Ak = Pff^APk', that is, PkAk = APk- Then 

Pk+iUk+i = PkQkRkUk = PkAkUk = APkUk- 

By induction, PkUk = A^ . However, Pk G U„ and Uk is triangular, with a 
positive real diagonal, as a product of such matrices. 



Theorem 10.2.1 Let A G GL„(C) be given. Assume that the moduli of 
the eigenvalues of A are distinct: 

|Ai| > IA 2 I > • • • > |A„| (> 0). 

In particular, the eigenvalues are simple, and thus A is diagonalizable: 

A = y~Miag(Ai, . . . ,Xn)Y. 

Assume also that Y admits an LU factorization. Then the strictly lower 
triangular part of Ak converges to zero, and the diagonal of Ak converges 
to D := diag(Ai, . . . , A„). 

Proof 

Let Y = LU be the factorization of Y. We also make use of the QR 
factorization of Y~^: Y~^ = QR. Since = Y~^D^Y, we have PkUk = 
Y-^D’^Y = QRD^LU. 

The matrix D^LD~^ is lower triangular with unit numbers on its di- 
agonal. By assumption, its strictly lower part tends to zero (because 
each term is multiplied by (Ai/A^)*, where \\i/\j\ < 1). Therefore, 
D’^LD~^ = In + Ek with Ek ^ On SiS k ^ -hoo. Hence, PkUk = 
QR{In + Ek)D'^U = Q(/„ + REkR-^)RD'^U = Q(/„ + Fk)RD'^U, where 
Ek On. Let OkTk = In + Ek be the QR factorization of In + Ek. By 
continuity. Ok and Tk both tend to Then 

PkUk = (QOk){TkRD^U). 

The first product is a unitary matrix, while the second is a triangular 
one. Let \D\ be the “modulus” matrix of D (whose entries are the moduli 
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of those of D), and let Di be \D\~^D, which is unitary. We also define 
D 2 = dia,g{ujj/\ujj\) and U' = D^^U. Then D 2 is unitary and the diagonal 
of U' is positive real. From the uniqueness of the QR factorization of an 
invertible matrix we obtain 

Pk = QOkD>[D2, Uk = {D’[D2)-^TkRD’[D2\D\'^U', 

which yields 

Qk = Pk^Pk+i = D^^D^'^O^^Ok+iD'l+^D2, 

Rk = Uk+iU-^ = D^^DY'^-^Tk+iRDR-^T-^D^D2. 

Since and are bounded, we deduce that Qk converges, to Di. 

Similarly, Rk — R'f. ^ Qn, where 

R'^ = D^^D^'^RDR-^D'1-^D2. (10.3) 

The fact that the matrix R'j^ is upper triangular shows that the strict lower 
triangular part of Ak = QkRk tends to zero (observe that the sequence 
{Rk)keiN is bounded, because the set of unitary matrices conjugate to A 
is bounded). Similarly, the diagonal of R'f. is \D\, which shows that the 
diagonal of Ak converges to Di\D\ = D. 

■ 

Remark: Formula (10.3) shows that the sequence Ak does not converge, 
at least when the eigenvalues have distinct complex arguments. However, 
if the eigenvalues have equal complex arguments, for example if they are 
real and positive, then D\ = aln and Rk ^ T := R\D\R~^ D 2 ] hence 

Ak converges to aT. Note that this limit is not diagonal in this case. 

The situation is especially favorable for tridiagonal Hermitian matrices. 
To begin with, we may assume that A is positive definite, up to the change 
of A into A + uln with /x > —p{A). Next, we can write A in block-diagonal 
form, where the diagonal blocks are tridiagonal irreducible Hermitian ma- 
trices. The QR method then treats each block separately. We are thus 
reduced to the case of a Hermitian positive definite, tridiagonal and irre- 
ducible matrix. Its eigenvalues are real, strictly positive, and simple, from 
Proposition 10.1.2: we have Ai > • • • > A„ > 0. We can then use the 
following statement. 

Theorem 10.2.2 Let A G GL„(C) be an irreducible Hessenberg matrix 
whose eigenvalues are of distinct moduli: 



|Ai| > ••• > |A„| (>0). 
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Then the QR method converges; that is, the lower triangular part of Ak 
converges to 




Vo • • • 0 A„ ; 

Proof 

In the light of Theorem 10.2.1, it is enough to show that the matrix 
Y in the previous proof admits an LU factorization. We have YA = 
diag(Ai, . . . , Xn)Y. The rows of Y are thus the left eigenvectors: IjA = Xjlj. 

If X G C" is nonzero, there exists a unique index r such that Xr ^ 0, 
while j > r implies Xj = 0. By induction, quoting the Hessenberg form and 
the irreducibility of A, we obtain {A”^x)r+m ^ 0, while j > r + m implies 
{A"^x)j = 0. Hence, the vectors x, Ax , . . . , A^~^x are linearly independent. 
A linear subspace, stable under A and containing x, is thus of dimension 
greater than or equal to n — r + 1. 

Let T’ be a linear subspace, stable under A, of dimension p > 1. Let 
r be the smallest integer such that F contains a nonzero vector x with 
Xr+i = • • • = x„ = 0. The minimality of r implies that x^. yf 0. Hence, we 
have p > n — r + \. By construction, the intersection of F and of linear 
subspace [e^, . . . spanned by e^, . . . reduces to {0}. Thus we 

also have p + (r — 1) < n. Finally, r = n — p + 1, and we see that 

F(B[e\... ,e”-P] = ^". 

Let us choose F = [^i, . . . , which is stable under A. Then p= n — q, 
and we have 

[Zi,... ,Zg]^0[eV... ,e«] = C'". 

This amounts to saying that det{lje^)i<j^k<q ^ 0. In other words, the 
leading principal minor of order g of T is nonzero. From Theorem 8.1.1, Y 
admits an LU factorization. 



Corollary 10.2.1 If A G HPD„ and ifAo is a Hessenberg matrix, unitar- 
ily similar to A (for example, a matrix obtained by Householder’s method), 
then the sequence Ak defined by the QR method converges to a diagonal 
matrix whose diagonal entries are the eigenvalues of A. 

Indeed, Aq is block-diagonal with irreducible diagonal blocks. We are thus 
reduced to the case of a Hermitian positive definite tridiagonal irreducible 
matrix. Such a matrix satisfies the hypotheses of Theorem 10.2.2. The lower 
triangular part converges, hence the whole matrix, since it is Hermitian. 
Implementing the QR method: The QR method converges faster as 
A„, or merely A„/A„_i, becomes smaller. We can obtain this situation 
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by translating Ak ^ Ak — akin- The strategies for the choice of Uk are 
described in [25]. This procedure is called Rayleigh translation. It allows 
for a observeable improvement of the convergence of the QR method. If 
the eigenvalues of A are simple, a suitable translation allows us to restrict 
ourselves to the case of distinct moduli. But this trick has a nonnegligible 
cost if A is a real matrix with a pair of complex conjugate eigenvalues, 
since it requires a translation by a nonreal number a. As mentioned above, 
the computations become much more costly than they are in the domain 
of real numbers. 

As k increases, the triangular form of Ak appears first at the last row. 
In other words, the sequence (Ak)nn converges more rapidly thanother 
sequences {Ak)jj. When the last row is sufficiently close to (0, . . . , 0, A„), 
the Rayleigh translation must be selected in such a way as to bring A„_i, 
instead of A„, to the origin; and so on. 

With a clever choice of Rayleigh translations, the QR method, when it 
converges, is of order two for a generic matrix, and is of order three for a 
Hermitian matrix. 



10.3 The Jacobi Method 

The Jacobi method allows for the approximate computation of the whole 
spectrum of a real symmetric matrix A G Sym„. As in the QR method, 
one constructs a sequence of matrices, unitarily similar to A. In particular, 
the round-off errors are not amplified. Each iteration is cheap (0(n) opera- 
tions), and the convergence is quadratic when the eigenvalues are distinct. 
It is thus a rather efficient method. 



10.3.1 Conjugating by a Rotation Matrix 

Let I < p, <7 < n be two distinct indices and 9 € [— tt, tt) an angle. We 
denote by Rp^q{9) the rotation matrix through the angle 9 in the plane 
spanned by and e*. For example, li p < q, then 



R = Rp,q{(t) '■ = 



IP—1 


0 




0 


cos 9 




sin 9 




0 : 


Iq-p-l 




0 


— sin 9 




cos 9 




0 : 


0 




In- 
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If i? is a symmetric matrix, we compute K := R '^HR= HR, which is 



Setting c 


= cos 9. 


, s = sin 9 


the following 


kij = 


hij 


if ij 


q, 


kip = 


chip — 


- shiq if 




kiq = 


chiq 4 


- ship if 




kpp — 


c^h 

O / tpp 


4- S^hqq — 


^Cshpq , 


II 


c^h 

o / iqq 


4“ s^ hpp 4“ 


‘Zcshpq^ 


II 


Cs{hpp hqq) 


(c S ^hpq. 


computation of 


entries kij 


for ij ^ p, 



kpp,kqq, and kpq is 0(1). The cost of this conjugation is thus 6n + 0(1) 
operations, keeping in mind the symmetry = K. 

Let us remark that the conjugation by the rotation through the angle 
0±7T yields the same matrix K. For this reason, we limit ourselves to angles 
0 G [—tt/2, 7t/2). 



10.3.2 Description of the Method 

One constructs a sequence = A,A^^\... of symmetric matrices, 
each one conjugate to the previous one by a rotation as above: = 

(i?(0)T^(fc)^(fe). At step k, we choose two distinct indices p and q (in fact, 
Pk,qk) in such a way that yf 0 (if it is not possible, A^'^^ is already a 
diagonal matrix similar to A). We then choose 9 (in fact 9k) in such a way 
that = 0. From the formulas above, this is equivalent to 

cs{afj> - o^) + (c2 - ^ 0. 

This amounts to solving the equation 

(fc) _ (fc) 

o 

This equation possesses two solutions in [— 7 t/2, 7t/ 2), namely 9k G 
[— 7r/4,7r/4) and 9k ± tt/2. There are thus two possible rotation matri- 
ces, which yield to two distinct results. Once the angle has been selected, 
its computation is useless (it would be actually rather expensive). In fact, 
t := tan^fc solves 

^ =tan20; 



that is, 



t ‘2t(7k — 1 — 0. 
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The two angles correspond to the two possible roots of this quadratic 
equation. We then obtain 



"" VT+t^’ 



s = tc. 



We shall see below that the best choice is the angle 9k G [— tt/4, tt/4), which 
corresponds to the unique root t in [—1, 1). 

The computation of c,s needs only 0(1) operations, so that the cost of 
an iteration of the Jacobi method is 6n + 0(1). Observe that an entry that 
has vanished at an iteration becomes in general nonzero after a few more 
iterations. 



10.3.3 Convergence of the Jacobi Method 

We use here the Schur norm ||M|| = (TrM^M)^/^, also called the Frobe- 
nius norm, denoted elsewhere by ||Af|| 2 . Since it amounts to showing that 
converges to a diagonal matrix, we decompose this matrix in the form 
^(k) _ _l_ where Dk = diag(a^^\ . . . , o™). To begin with, since the 

sequence is formed of unitarily similar matrices, we have = ||2l||. 



Lemma 10.3.1 We have 

\\Ek+if =\\Ekr 

Proof 

It is sufficient to redo the calculations of Section 10.3.1, noting that 

^2 I ^2 _ 7 2 I t2 

'^ip ^ '^iq ~ "'ip ' "iq 

whenever i p,q, while = 0. 

^ ■ 

We deduce from the lemma that |jilfc+i|p = + 2 . The 

convergence of the Jacobi method depends, then, on the choice of the pair 
{p, q) at each step. For example, the choice of the same pair at two consec- 
utive iterations is stupid, since it yields . A first strategy (the 

so-called optimal choice) consists in taking the pair (p, q) that optimizes the 
instantaneous decay of ||Afc||, that is, maximizes the number |apq^|. Since 
this method involves the sorting of n{n — 1)/2 entries, it is rather expensive. 
Other strategies are available. One can, for instance, range over every pair 
{p,q) with p < q, or choose a {p,q) for which \ap^J\ is larger than some 
threshold. Here we shall study only the method with optimal choice. 



Theorem 10.3.1 With the “optimal choice” of{pk, qk) and with the choice 
9 k G [— 7t/4, tt/4), the Jacohi method converges in the following sense. There 
exists a diagonal matrix D such that 



-L»|| < 



V^WEoW k 

1 „ P ’ 




2 



— n 



1-/5 



P '■= 
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In particular, the spectrum of A consists of the diagonal terms of D, and 
the Jacohi method is of order one at least. 

Proof 

With the optimal choice of {p, q) , we have 

(n2-n)(a^) > Pfcf . 

Hence, 

\\E,+^r<(i-^-) \\E,r. 

\ — n J 

It follows that ||iffc|| < p^||ifo||- In particular, Ek tends to zero as k ^ +oo. 

It remains to show that Dk converges too. A calculation using the 
notation of Section 10.3.1 and the fact that kpq = 0 yield 

kpp hpp — thpq. 

Since \9k\ < tt/ 4, we have |t| < 1, so that — a^pj\ < \a!'pq\. Likewise, 

— a^qq\ < |apq^|. Siuce the other diagonal entries are unchanged, we 
have \\Dk+i ~ Du\\ < \\Ek\\. 

We have seen that \\Ek\\ < p^||Ao||. Therefore, 

k 

\\Di-Dk\\<\\Eo\\-^, l>k. 

I- p 

The sequence {Dk)keiN is thus Cauchy, hence convergent. Since Ek tends 
to zero, Ak converges to the same limit D. This matrix is diagonal, with 
the same spectrum as A, since this is true for each Ak- Finally, we obtain 

||A« - Df = \\Dk- or + \\Ekf < 



10.3.4 Quadratic Convergence 

The following statement shows that the Jacobi method compares rather 
well with other methods. 

Theorem 10.3.2 The Jacohi method with optimal choice of {p, q) is of 
order two when the eigenvalues of A are simple, in the following sense. Let 
N = n{n — l)/2 he the number of elements under the diagonal. Then there 
exists a number c > 0 such that 

||Afc+Ar|| < c||iffc|p, 

for every k G TV. 



Proof 
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We first remark that if i ^ j with {i,j} ^ {pi, qi}, then 



\a^^^-a^\<\U\V2\\Ei\\, (10.5) 

where ti = ta,n 6i. To see this, observe that 1 — c < t and |s| < t 
whenever |t| < 1. However, Theorem 10.3.1 ensures that Dk converges 
to diag(Ai,... ,A„), where the A^’s are the eigenvalues of A. Since these 
are distinct, there exist K G IN and (5 > 0 such that, if k > K, then 



min 




— a 



Wi 



> <5 



for k > K. We have therefore 

kfcl > 



fc— ^+oo 



V2\\Ek\\ 



+ 00 . 



It follows that tk tends to zero and, more precisely, that 

1 

Finally, there exists a constant ci such that 



\tk\ < ci||F^fc||- 



Let us fix then k larger than K, and let us denote by J the set of pairs 
(pi,qi) when k<l<k + N— 1. For such an index, we have \\Ei\\ < 
< ll^^fcll- In particular, \ti\ < ci||F;fc||. 

If {p, q) G J and if I < k+N is the largest index such that (p, q) = (pi,qi), 
a repeated application of (10.5) shows that 

< ciiVV2||F;fcf . 

If J is equal to the set of pairs (i,j) such that i < j, these inequalities 
ensure that \\Ek+N\\ < C 2 ||iffc|p. Otherwise, there exists a pair (p,q) that 
one twice sets to zero: (p, q) = (pi, qi) = {pm, qm) with k<l<m<k + N. 
In that case, the same argument as above shows that 

\\Ek+N\\ < \\E^\\ < < 2VNci{m - OPfcf • 



Remarks: Exercise 18 shows that the distance between the diagonal and 
the spectrum of A is 0(||i?fc|p), and not 0(||i?fc||) as naively expected. We 
shall also analyze, in Exercise 10, the (bad) behavior of Dk when we make 
the opposite choice 7t/4 < \9k\ < tt/2. 



10.4 The Power Methods 

The power methods allow only for the approximation of a single eigenvalue. 
Of course, their cost is significantly lower than that of the previous ones. 
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The standard method is especially designed for the search for the optimal 
parameter in the SOR method for a tridiagonal matrix, where we have to 
compute the spectral radius of the Jacobi iteration matrix (Theorem 9.4.1). 



10.4-1 The Standard Method 

Let M e M„(C) be a matrix. We search for an approximation of its eigen- 
value of maximum modulus, whenever only one such exists. The standard 
method consists in choosing a norm on C", a unit vector G C”, and 
then computing successively the vectors by the formula 



:= 



1 






The justification of this method is given in the following theorem. 



Theorem 10.4.1 One assumes that SpM contains only one element A of 
maximal modulus (that modulus is thus equal to p{M)). 

If p{M) = 0, the method stops because Mx* = 0 for some k < n. 
Otherwise, let C" = E (B F be the decomposition of C", where E, F are 
stable linear subspaces under M, with Sp(M|£;) = {A} and A ^ Sp(M|i^’). 
Assume that ^ F. Then Mx^ yf 0 for every k G IN and: 



1 . 



lim \\Mx^\\=p{M). (10.6) 

fc— ^ + oo 

2 . 

fc— >- 1-00 ^p(^M) J 

is a unit eigenvector of M , associated to the eigenvalue A. 



3. If Vj yf 0, then 



lim 

fc— ^ + oo 




= A. 



Proof 

The case p{M) = 0 is obvious because M is then nilpotent. We may thus 
assume that p{M) > 0. 

Let x^ = y^ + be the decomposition of with y^ G E and G F. 
By assumption, yf 0. Since M\e is invertible, M^y^ yf 0. Since M^x^ = 
M^y^ _l_ M^z^, M^y^ G E, and G F, we conclude that M^x^ yf 0. 
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The algorithm may be rewritten as^ 



= 






-M 






We therefore have x* yf 0. 

If F {0}, then p{M\p) < p{M) by construction. Hence there exist 
(from Theorem 4.2.1) p < p{M) and C > 0 such that |j(M|i?)^|| < Cp^ for 
every k. Then |j < C\p^ . On the other hand, p{{M\e)~^) = 
l/p{M), and the same argument as above ensures that ||(M|£;)“^|| < 
XjCip^, for some p G (p,p{M)), so that ||M''y°|| > Hence, 

||M'=zO|| « iimVII, 



so that 



rM'^y 









We are thus reduced to the case where = 0, that is, where M has no 
eigenvalue but A. That will be assumed from now on. 

Let r be the degree of the minimal polynomial of M . The vector space 
spanned by the vectors Ma;°, . . . , contains all the x^’s. Up to 

the replacement of £?" by this linear subspace, one may assume that it 
equals C". Then we have r = n. Furthermore, since ker(M — A)"“^, a 
nontrivial linear subspace, is stable under A, we see that ^ ker(M — 

A)”-b 

The vector space C” then admits the basis 

= x°, v^ = {M- A)x°, . . . , u” = (M - A)”-^a;°}. 



With respect to this basis, M becomes the Jordan matrix 



■■ 

1 A ) 

The matrix A“*M* depends polynomially on k. The coefficient of highest 
degree, as k ^ +oo, is at the intersection of the first column and the last 
row. It equals 

t 1 ) 



M = 



/A 0 
1 
0 

V ^ ••• 



^One could normalize at the end of the computation, but we prefer doing it at 
each step in order to avoid overflows, and also to ensure (10.6). 
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which is equivalent to(fc/A)" ^/(n — 1)!. We deduce that 









^n— 1 n+1 

(n-l)! ' 



Hence, 






Since w" is an eigenvector of M, the claims of the theorem have been proved. 



The case where the algebraic and geometric multiplicities of A are equal 
(that is, M\e = A/b), for example if A is a simple eigenvalue, is especially 
favorable. Indeed, = A^j/°, and therefore 



X 



k 




V lAI'^ )' 



Theorem 4.2.1 thus shows that the error 



- 



tends to zero faster than 



/ p{M\f) + e \ 

V P{M) ) 



for every e > 0. The convergence is thus of order one, and becomes faster 
as the ratio |A 2 |/|Ai| becomes smaller (arranging the eigenvalues by nonin- 
creasing moduli). However, the convergence is much slower when the Jordan 
blocks of M relative to A are nontrivial. The error decays then like 1/k in 
general. 

The situation is more delicate when p{M) is the modulus of several 
distinct eigenvalues. The vector x^ , suitably normalized, does not converge 
in general but “spins” closer and closer to the sum of the corresponding 
eigenspaces. The observation of the asymptotic behavior of x^ allows us 
to identify the eigendirections associated to the eigenvalues of maximal 
modulus. The sequence ||Ma;*|| does not converge and depends strongly on 
the choice of the norm. However, log ||Ma;^|| converges in the Cesaro sense, 
that is, in the mean, to logp(M) (Exercise 12). 

Remark: The hypothesis on xq is generic, in the sense that it is satisfied for 
every choice of xq in an open dense subset of C". If by chance belongs to 
F, the power method furnishes theoretically another eigenvalue, of smaller 
modulus. In practice, a large enough number of iterations always allows for 
the convergence to A. In fact, the number A is rarely exactly representable in 
a computer. If it is not, the linear subspace F does not contain any nonzero 
representable vector. Thus the vector x^ , or its computer representation, 
does not belong to F, and Theorem 10.4.1 applies. 
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10.4-2 The Inverse Power Method 

Let us assume that M is invertible. The standard power method, applied 
to M~^, furnishes the eigenvalue of least modulus, whenever it is simple, or 
at least its modulus in the general case. Since the inversion of a matrix is a 
costly operation, we involve ourselves with that idea only if M has already 
been inverted, for example if we had previously had to make an L 17 or a QR 
factorization. That is typically the situation when one begins to implement 
the QR algorithm for M. It might look strange to involve a method giving 
only one eigenvalue in the course of a method that is expected to compute 
the whole spectrum. 

The inverse power method is thus subtle. Here is the idea. One begins 
by implementing the QR method, until one gets coarse approximations 
/ii, . . . , Hn of the eigenvalues Ai, . . . , A„. If one persists in the QR method, 
the proof of Theorem 10. 2. 1 shows that the error is at best of order cr* 
with a = maxj \\j+i/Xj\. When n is large, u is in general close to 1 and 
this convergence is rather slow. Similarly, the method with Rayleigh trans- 
lations, for which cr is replaced by cr(ry) := maxj | (Aj+i — rf)/ (Aj — r/) |, is not 
satisfactory. However, if one wishes to compute a single eigenvalue, say Ap, 
with full accuracy, the power method, applied to M — ^pln, produces an 
error on the order of 9^, where 6 := \Xp — Hp\/ min^^p |Aj — /ip| is a small 
number, since Ap — /ip is small. 

In practice, the inverse power method is used mainly to compute an 
approximate eigenvector, associated to an eigenvalue for which one already 
has a good approximate value. 



10.5 Leverrier’s Method 

The method of Leverrier allows for the computation of the characteris- 
tic polynomial of a square matrix. Though inserted in this Chapter, this 
method is not suitable for computing approximate values of the eigenval- 
ues of a matrix. First of all, it furnishes only the characteristic polynomial 
which, as mentioned at the opening if this chapter, is not a good technique 
for computing the eigenvalues. Its interest is purely academic. Observe, 
however, that it is of great generality, applying to matrices with entries in 
any field of characteristic 0. 



10.5.1 Description of the Method 

Let 77 be a field of characteristic 0 and M G M„(i7) be given. Let us 
denote by Ai, . . . , A„ the eigenvalues of M, counted with multiplicity. Let 
us define the two following lists of n numbers: 
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Elementary symmetric polynomials 

(7i := 

<72 ■ = 

(7 'f' . — 



Gn — 

Newton sums 

Sm '■= 2_^ -^j“, I S m S n. 

3 

The numbers (—lyaj are the coefficients of the characteristic polynomial 
of M: 

Pm{X) = X" - - . . . + (-l)”a„. 

Furthermore, the Sm are the traces of the powers M™. One can obtain 
them by computing ,M". Each of these matrices is obtained in 

0(n“) operations, with 2 < a < 3 (a = 3, using the naive method for 
the product of two matrices). In all, the computation of si, . . . , s„ needs 
0(n“+^) operations, which is a lot, compared to iterative methods {QR , 
Jacobi), for which each iteration is made in O(n^) operations at worst. 

The passage from Newton sums to elementary symmetric polynomials is 
done through Newton’s formulas. If = {—lyaj and Sq := 1, we have 

m 

mT^rn + ^ = 0 , 1 < fl. 

k^l 

One uses these formulas in increasing order, beginning with Si = — si. 
When Si, . . . , S^-i are known, one computes 

1 

Sm — (SlSm-1 + • • • + SmSo). 

m 

This computation, which needs only operations, has a negligible cost. 

Besides the high cost of this method, its instability is unfortunate when 
k = M or k = C: when n is large, Sk increases like thus much 

more rapidly than ak- The eigenvalues of smaller modulus are thus much 
perturbed by the round-off errors, and this is reinforced by the large number 
of operations. 



Ai • -l- Xn — Tr M, 
XjXk, 

j<k 

^31 ■ ■ ■ ^3r ; 

3l<-<jr 

Xj = det M. 

3 

m. 
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When the field is of nonzero characteristic p, the Leverrier method may 
be employed only if n < p. Since Sp = af, the computation of the Sm’s for 
m > p does not bring any new information about the (Jj’s. 



10.6 Exercises 



1. Given a polynomial P € M[X], use the Euclidean division in order to 
define a sequence of nonzero polynomials Pj in the following way. Set 
Po = P, Pi = P'- If Pj is not constant, —Pj+i is the remainder of 
the division of Pj-i by Pj-. Pj-i = QjPj — Pj+i, degPj+i < deg Pj. 

(a) Assume that P has only simple roots. Show that the sequence 
{Pj)j is well-defined, that it has only finitely many terms, and 
that it is a Sturm sequence. 

(b) Use Proposition 10.1.3 to compute the number of real roots of 
the real polynomials -|- aX + h oi X^ + pX -I- g in terms of 
their discriminants. 



2. (J. Wilkinson [35], Section 5.45) Let n = 2p — 1 be an odd number 
and Wn G M„(fR) be the symmetric tridiagonal matrix 



/pi \ 



V I p / 

The diagonal entries are thus p,p— 1, . . . , 2, 1, 2, . . . ,p— l,p, and the 
subdiagonal entries are equal to 1. 

(a) Show that the linear subspace 

P' = {A G JR” I Xp+j = Xp-j, l< j <p} 

is stable under Wn- Similarly, show that the linear subspace 



E” = {A G IR” I Xp+j = -Xp-j,0 < j <p} 
is stable under Wn- 

(b) Deduce that the spectrum of W„ is the union of the spectra of 
the matrices 



/ P 
1 



w: = 



2 1 
2 1 / 



(G Mp(fR)) 




and 
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/ P 

1 



w: = 



V 






1 



(G Mp_i(K)). 



3 1 

1 2 j 



(c) Show that the eigenvalues of W” separate strictly those of W!^. 



3. For ai, . . . ,a„ G M, with aj = 1, form the matrix 



M{a) 



( ai 02 03 04 

02 62 03 : 



a„ \ 



03 03 63 

04 

V o„ 



o„ 



o„ 

bn J 



where := oi + • • • + Oj_i — (j — 2)aj. 

(a) Compute the eigenvalues and the eigenvectors of M{a). 

(b) We limit ourselves to n-uplets o that belong to the simplex S 

defined by 0 < o„ < • • • < oi and aj = 1. Show that for 
a G S M{a) is bistochastic and 62 — 02 < • • • < — o„ < 1. 

(c) Let /xi, . . . , be an n-uplet of elements in [0, 1] with = 1. 
Show that there exists a unique o in S' such that {/xi, . . . ,/Xn} 
is equal to the spectrum of M(o) (counting with multiplicity). 

(d) Consider the unit sphere S of M„(iR), when this space is en- 
dowed with the norm ||M||2 = a/ p(M'^M). Show that if P G S, 
then there exists a convex polytope T, of dimension (n — 1)^, 
included in S and containing P. Hint: Use Corollary 5.5.1, with 
unitary invariance of the norm |j • H2. 



4. Show that the cost of an iteration of the QR method for a Hermitian 
tridiagonal matrix is 20n-|- 0(1). 

5. Show that the reduction to the Hessenberg form (in this case, 
tridiagonal form) of a Hermitian matrix costs 7n^/6 -I- O(n^) 
operations. 



6. (Invariants of the algorithm QR ) For M G M„(IR) and 1 < fc < n— 1, 
let us denote by (M)k the matrix of size (n — k)x(n— k) obtained by 
deleting the first k rows and the last k columns. For example, (I)i is 
the Jordan matrix J(0; n — 1). We shall denote also hy K G M„(IR) 
the matrix defined by = 1 and kij = 0 otherwise. 
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(a) For an upper triangular matrix T, compute explicitly KT and 
TK. 

(b) Let M G Prove the equality 

det(M — XI — ^K) = (— l)"^det(M — A/)i + det(M — XI). 

(c) Let A G GL„(iR) be given, with factorization A = QR. Prove 
that 

det(A — A/)i = ^ det(Q — XR~^)\. 

^rm 

(d) Let A' = RQ. Show that 

r„„ det(A' — A/)i = rn det(A — A/)i. 

(e) Generalize the previous calculation by replacing the index 1 by 
k. Deduce that the roots of the polynomial det(A — XI)k are 
conserved throughout the QR algorithm. How many such roots 
do we have for a general matrix? How many for a Hessenberg 
matrix? 

7. (Invariants; continuing) For M G M„(iR), let us define PM{h]z) := 
det((l - h)M + hM'^ - zJ„). 

(a) Show that PM{h] z) = PmQ — h; z). Deduce that there exists a 

polynomial Qm such that PM{h; z) = — h); z). 

(b) Show that Qm remains constant throughout the QR algorithm: 
If Q G 0„(JR), R is upper triangular, and M = QR, N = RQ, 
then Qm = Qn- 

(c) Deduce that there exist polynomial functions Jrk on M„(iR), 
defined by 

n [r/2] 

PM{h-, z) = ^ - h))>^z^-^Jrk{M), 

r— 0 k—0 

that are invariant throughout the QR algorithm. Verify that the 
Jro’s can be expressed in terms of invariants that we already 
know. 

(d) Compute explicitly J 21 when n = 2. Deduce that in the case 
where Theorem 10.2.1 applies and detH > 0, the matrix Ak 
converges. 

(e) Show that for n > 2, 

J2i(M) = -iTr((M-M^)2). 

Deduce that if Ak converges to a diagonal matrix, then A is 
symmetric. 
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8. In the Jacobi method, show that if the eigenvalues are simple, then 
the product ■ ■ ■ R™ converges, to an orthogonal matrix R such 
that R* AR is diagonal. 



9. Extend the Jacobi method to Hermitian matrices. Hint: Replace the 
rotation matrices 

/ cos 9 sind \ 

— sin 6 cos 9 J 

by unitary matrices 

f Zi Z2 \ 

V ^3 ) ' 

10. Let A G Sym„(IR) be a matrix whose eigenvalues, of course real, are 
simple. Apply the Jacobi method, but selecting the angle 9/~ so that 
7 t/ 4 < |0fc| < tt/2. 

(a) Show that E/~ tends to zero, that the sequence Dk is relatively 
compact, and that its cluster values are diagonal matrices whose 
diagonal terms are the eigenvalues of A. 

(b) Show that an iteration has the effect of permuting, asymp- 
totically, ttpp^ and agg\ where (p,q) = (pk,qk)- In other 
words 



lim 

k — »-+oo 



(k+i) 

pp 



= 0 , 



and vice versa, permuting p and q. 



11. The Bernoulli method computes an approximation of the root of 
largest modulus for a polynomial aoJf" + • • • J- a„, when that root 
is unique. To do so, one defines a sequence by a linear induction of 
order n: 

Zk = - — (aiZk-l H h anZk-n)- 

do 

Compare this method with the power method for a suitable matrix. 

12. Consider the power method for a matrix M e M„(C) of which several 
eigenvalues are of modulus p{M) ^ 0. Again, C" = if 0 F is the 
decomposition of C” into linear subspaces stable under M, such that 
p{M\f) < p(M) and A e SplMI^) ^ |A| = p(M). Finally, = 
yO 0 with yO gE,z°G E, and ^ 0. 

(a) Express 

. m—1 

-^log||Mx'=|| 

fc=0 



in terms of 
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(b) Show that if 0 < < p{M) < rj, then there exist constants C, C 

such that 

Vfc e IN. 

(c) Deduce that log||Ma;*|| converges in the mean to log p{M). 

13. Let M G M„(C) be given. Assume that the Gershgorin disk Di is 
disjoint from the other disks Dm, m ^ 1. Show that the inverse power 
method, applied to M — muin, provides an approximate computation 
of the unique eigenvalue of M that belongs to Di. 
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